[ceph-users] Re: ceph df (octopus) shows USED is 7 times higher than STORED in erasure coded pool

2021-07-06 Thread Christian Wuerdig
Ceph on a single host makes little to no sense. You're better of running
something like ZFS

On Tue, 6 Jul 2021 at 23:52, Wladimir Mutel  wrote:

> I started my experimental 1-host/8-HDDs setup in 2018 with
> Luminous,
> and I read
> https://ceph.io/community/new-luminous-erasure-coding-rbd-cephfs/ ,
> which had interested me in using Bluestore and rewriteable EC
> pools for RBD data.
> I have about 22 TiB or raw storage, and ceph df shows this :
>
> --- RAW STORAGE ---
> CLASSSIZEAVAILUSED  RAW USED  %RAW USED
> hdd22 TiB  2.7 TiB  19 TiB19 TiB  87.78
> TOTAL  22 TiB  2.7 TiB  19 TiB19 TiB  87.78
>
> --- POOLS ---
> POOL   ID  PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
> jerasure21  1  256  9.0 TiB2.32M   13 TiB  97.06276 GiB
> libvirt 2  128  1.5 TiB  413.60k  4.5 TiB  91.77140 GiB
> rbd 3   32  798 KiB5  2.7 MiB  0138 GiB
> iso 4   32  2.3 MiB   10  8.0 MiB  0138 GiB
> device_health_metrics   51   31 MiB9   94 MiB   0.02138 GiB
>
> If I add USED for libvirt and jerasure21 , I get 17.5 TiB, and 2.7
> TiB is shown at RAW STORAGE/AVAIL
> Sum of POOLS/MAX AVAIL is about 840 GiB, where are my other
> 2.7-0.840 =~ 1.86 TiB ???
> Or in different words, where are my (RAW STORAGE/RAW
> USED)-(SUM(POOLS/USED)) = 19-17.5 = 1.5 TiB ?
>
> As it does not seem I would get any more hosts for this setup,
> I am seriously thinking of bringing down this Ceph
> and setting up instead a Btrfs storing qcow2 images served over
> iSCSI
> which looks simpler to me for single-host situation.
>
> Josh Baergen wrote:
> > Hey Wladimir,
> >
> > I actually don't know where this is referenced in the docs, if anywhere.
> Googling around shows many people discovering this overhead the hard way on
> ceph-users.
> >
> > I also don't know the rbd journaling mechanism in enough depth to
> comment on whether it could be causing this issue for you. Are you seeing a
> high
> > allocated:stored ratio on your cluster?
> >
> > Josh
> >
> > On Sun, Jul 4, 2021 at 6:52 AM Wladimir Mutel  m...@mwg.dp.ua>> wrote:
> >
> > Dear Mr Baergen,
> >
> > thanks a lot for your very concise explanation,
> > however I would like to learn more why default Bluestore alloc.size
> causes such a big storage overhead,
> > and where in the Ceph docs it is explained how and what to watch for
> to avoid hitting this phenomenon again and again.
> > I have a feeling this is what I get on my experimental Ceph setup
> with simplest JErasure 2+1 data pool.
> > Could it be caused by journaled RBD writes to EC data-pool ?
> >
> > Josh Baergen wrote:
> >  > Hey Arkadiy,
> >  >
> >  > If the OSDs are on HDDs and were created with the default
> >  > bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus,
> then in
> >  > effect data will be allocated from the pool in 640KiB chunks
> (64KiB *
> >  > (k+m)). 5.36M objects taking up 501GiB is an average object size
> of 98KiB
> >  > which results in a ratio of 6.53:1 allocated:stored, which is
> pretty close
> >  > to the 7:1 observed.
> >  >
> >  > If my assumption about your configuration is correct, then the
> only way to
> >  > fix this is to adjust bluestore_min_alloc_size_hdd and recreate
> all your
> >  > OSDs, which will take a while...
> >  >
> >  > Josh
> >  >
> >  > On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev  > wrote:
> >  >
> >  >> The pool *default.rgw.buckets.data* has *501 GiB* stored, but
> USED shows
> >  >> *3.5
> >  >> TiB *(7 times higher!)*:*
> >  >>
> >  >> root@ceph-01:~# ceph df
> >  >> --- RAW STORAGE ---
> >  >> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> >  >> hdd196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85
> >  >> TOTAL  196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85
> >  >>
> >  >> --- POOLS ---
> >  >> POOL   ID  PGS  STORED   OBJECTS  USED
>  %USED  MAX
> >  >> AVAIL
> >  >> device_health_metrics   11   19 KiB   12   56 KiB
>   0
> >  >>   61 TiB
> >  >> .rgw.root   2   32  2.6 KiB6  1.1 MiB
>   0
> >  >>   61 TiB
> >  >> default.rgw.log 3   32  168 KiB  210   13 MiB
>   0
> >  >>   61 TiB
> >  >> default.rgw.control 4   32  0 B8  0 B
>   0
> >  >>   61 TiB
> >  >> default.rgw.meta58  4.8 KiB   11  1.9 MiB
>   0
> >  >>   61 TiB
> >  >> default.rgw.buckets.index   68  1.6 GiB  211  4.7 GiB
>   0
> >  >>   61 TiB
> >  >>
> >  >> default.rgw.buckets.data   10  128  501 GiB5.36M  3.5 TiB
>  1.90
> >  >> 110 TiB
> >  >>
> >  >> The *default.rgw.buckets.dat

[ceph-users] Re: ceph df (octopus) shows USED is 7 times higher than STORED in erasure coded pool

2021-07-06 Thread Josh Baergen
Oh, I just read your message again, and I see that I didn't answer your
question. :D I admit I don't know how MAX AVAIL is calculated, and whether
it takes things like imbalance into account (it might).

Josh

On Tue, Jul 6, 2021 at 7:41 AM Josh Baergen 
wrote:

> Hey Wladimir,
>
> That output looks like it's from Nautilus or later. My understanding is
> that the USED column is in raw bytes, whereas STORED is "user" bytes. If
> you're using EC 2:1 for all of those pools, I would expect USED to be at
> least 1.5x STORED, which looks to be the case for jerasure21. Perhaps your
> libvirt pool is 3x replicated, in which case the numbers add up as well.
>
> Josh
>
> On Tue, Jul 6, 2021 at 5:51 AM Wladimir Mutel  wrote:
>
>> I started my experimental 1-host/8-HDDs setup in 2018 with
>> Luminous,
>> and I read
>> https://ceph.io/community/new-luminous-erasure-coding-rbd-cephfs/ ,
>> which had interested me in using Bluestore and rewriteable EC
>> pools for RBD data.
>> I have about 22 TiB or raw storage, and ceph df shows this :
>>
>> --- RAW STORAGE ---
>> CLASSSIZEAVAILUSED  RAW USED  %RAW USED
>> hdd22 TiB  2.7 TiB  19 TiB19 TiB  87.78
>> TOTAL  22 TiB  2.7 TiB  19 TiB19 TiB  87.78
>>
>> --- POOLS ---
>> POOL   ID  PGS   STORED  OBJECTS USED  %USED  MAX
>> AVAIL
>> jerasure21  1  256  9.0 TiB2.32M   13 TiB  97.06276
>> GiB
>> libvirt 2  128  1.5 TiB  413.60k  4.5 TiB  91.77140
>> GiB
>> rbd 3   32  798 KiB5  2.7 MiB  0138
>> GiB
>> iso 4   32  2.3 MiB   10  8.0 MiB  0138
>> GiB
>> device_health_metrics   51   31 MiB9   94 MiB   0.02138
>> GiB
>>
>> If I add USED for libvirt and jerasure21 , I get 17.5 TiB, and
>> 2.7 TiB is shown at RAW STORAGE/AVAIL
>> Sum of POOLS/MAX AVAIL is about 840 GiB, where are my other
>> 2.7-0.840 =~ 1.86 TiB ???
>> Or in different words, where are my (RAW STORAGE/RAW
>> USED)-(SUM(POOLS/USED)) = 19-17.5 = 1.5 TiB ?
>>
>> As it does not seem I would get any more hosts for this setup,
>> I am seriously thinking of bringing down this Ceph
>> and setting up instead a Btrfs storing qcow2 images served over
>> iSCSI
>> which looks simpler to me for single-host situation.
>>
>> Josh Baergen wrote:
>> > Hey Wladimir,
>> >
>> > I actually don't know where this is referenced in the docs, if
>> anywhere. Googling around shows many people discovering this overhead the
>> hard way on ceph-users.
>> >
>> > I also don't know the rbd journaling mechanism in enough depth to
>> comment on whether it could be causing this issue for you. Are you seeing a
>> high
>> > allocated:stored ratio on your cluster?
>> >
>> > Josh
>> >
>> > On Sun, Jul 4, 2021 at 6:52 AM Wladimir Mutel > m...@mwg.dp.ua>> wrote:
>> >
>> > Dear Mr Baergen,
>> >
>> > thanks a lot for your very concise explanation,
>> > however I would like to learn more why default Bluestore alloc.size
>> causes such a big storage overhead,
>> > and where in the Ceph docs it is explained how and what to watch
>> for to avoid hitting this phenomenon again and again.
>> > I have a feeling this is what I get on my experimental Ceph setup
>> with simplest JErasure 2+1 data pool.
>> > Could it be caused by journaled RBD writes to EC data-pool ?
>> >
>> > Josh Baergen wrote:
>> >  > Hey Arkadiy,
>> >  >
>> >  > If the OSDs are on HDDs and were created with the default
>> >  > bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus,
>> then in
>> >  > effect data will be allocated from the pool in 640KiB chunks
>> (64KiB *
>> >  > (k+m)). 5.36M objects taking up 501GiB is an average object size
>> of 98KiB
>> >  > which results in a ratio of 6.53:1 allocated:stored, which is
>> pretty close
>> >  > to the 7:1 observed.
>> >  >
>> >  > If my assumption about your configuration is correct, then the
>> only way to
>> >  > fix this is to adjust bluestore_min_alloc_size_hdd and recreate
>> all your
>> >  > OSDs, which will take a while...
>> >  >
>> >  > Josh
>> >  >
>> >  > On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev > > wrote:
>> >  >
>> >  >> The pool *default.rgw.buckets.data* has *501 GiB* stored, but
>> USED shows
>> >  >> *3.5
>> >  >> TiB *(7 times higher!)*:*
>> >  >>
>> >  >> root@ceph-01:~# ceph df
>> >  >> --- RAW STORAGE ---
>> >  >> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
>> >  >> hdd196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85
>> >  >> TOTAL  196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85
>> >  >>
>> >  >> --- POOLS ---
>> >  >> POOL   ID  PGS  STORED   OBJECTS  USED
>>  %USED  MAX
>> >  >> AVAIL
>> >  >> device_health_metrics   11   19 KiB   12   56 K

[ceph-users] Re: ceph df (octopus) shows USED is 7 times higher than STORED in erasure coded pool

2021-07-06 Thread Josh Baergen
Hey Wladimir,

That output looks like it's from Nautilus or later. My understanding is
that the USED column is in raw bytes, whereas STORED is "user" bytes. If
you're using EC 2:1 for all of those pools, I would expect USED to be at
least 1.5x STORED, which looks to be the case for jerasure21. Perhaps your
libvirt pool is 3x replicated, in which case the numbers add up as well.

Josh

On Tue, Jul 6, 2021 at 5:51 AM Wladimir Mutel  wrote:

> I started my experimental 1-host/8-HDDs setup in 2018 with
> Luminous,
> and I read
> https://ceph.io/community/new-luminous-erasure-coding-rbd-cephfs/ ,
> which had interested me in using Bluestore and rewriteable EC
> pools for RBD data.
> I have about 22 TiB or raw storage, and ceph df shows this :
>
> --- RAW STORAGE ---
> CLASSSIZEAVAILUSED  RAW USED  %RAW USED
> hdd22 TiB  2.7 TiB  19 TiB19 TiB  87.78
> TOTAL  22 TiB  2.7 TiB  19 TiB19 TiB  87.78
>
> --- POOLS ---
> POOL   ID  PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
> jerasure21  1  256  9.0 TiB2.32M   13 TiB  97.06276 GiB
> libvirt 2  128  1.5 TiB  413.60k  4.5 TiB  91.77140 GiB
> rbd 3   32  798 KiB5  2.7 MiB  0138 GiB
> iso 4   32  2.3 MiB   10  8.0 MiB  0138 GiB
> device_health_metrics   51   31 MiB9   94 MiB   0.02138 GiB
>
> If I add USED for libvirt and jerasure21 , I get 17.5 TiB, and 2.7
> TiB is shown at RAW STORAGE/AVAIL
> Sum of POOLS/MAX AVAIL is about 840 GiB, where are my other
> 2.7-0.840 =~ 1.86 TiB ???
> Or in different words, where are my (RAW STORAGE/RAW
> USED)-(SUM(POOLS/USED)) = 19-17.5 = 1.5 TiB ?
>
> As it does not seem I would get any more hosts for this setup,
> I am seriously thinking of bringing down this Ceph
> and setting up instead a Btrfs storing qcow2 images served over
> iSCSI
> which looks simpler to me for single-host situation.
>
> Josh Baergen wrote:
> > Hey Wladimir,
> >
> > I actually don't know where this is referenced in the docs, if anywhere.
> Googling around shows many people discovering this overhead the hard way on
> ceph-users.
> >
> > I also don't know the rbd journaling mechanism in enough depth to
> comment on whether it could be causing this issue for you. Are you seeing a
> high
> > allocated:stored ratio on your cluster?
> >
> > Josh
> >
> > On Sun, Jul 4, 2021 at 6:52 AM Wladimir Mutel  m...@mwg.dp.ua>> wrote:
> >
> > Dear Mr Baergen,
> >
> > thanks a lot for your very concise explanation,
> > however I would like to learn more why default Bluestore alloc.size
> causes such a big storage overhead,
> > and where in the Ceph docs it is explained how and what to watch for
> to avoid hitting this phenomenon again and again.
> > I have a feeling this is what I get on my experimental Ceph setup
> with simplest JErasure 2+1 data pool.
> > Could it be caused by journaled RBD writes to EC data-pool ?
> >
> > Josh Baergen wrote:
> >  > Hey Arkadiy,
> >  >
> >  > If the OSDs are on HDDs and were created with the default
> >  > bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus,
> then in
> >  > effect data will be allocated from the pool in 640KiB chunks
> (64KiB *
> >  > (k+m)). 5.36M objects taking up 501GiB is an average object size
> of 98KiB
> >  > which results in a ratio of 6.53:1 allocated:stored, which is
> pretty close
> >  > to the 7:1 observed.
> >  >
> >  > If my assumption about your configuration is correct, then the
> only way to
> >  > fix this is to adjust bluestore_min_alloc_size_hdd and recreate
> all your
> >  > OSDs, which will take a while...
> >  >
> >  > Josh
> >  >
> >  > On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev  > wrote:
> >  >
> >  >> The pool *default.rgw.buckets.data* has *501 GiB* stored, but
> USED shows
> >  >> *3.5
> >  >> TiB *(7 times higher!)*:*
> >  >>
> >  >> root@ceph-01:~# ceph df
> >  >> --- RAW STORAGE ---
> >  >> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> >  >> hdd196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85
> >  >> TOTAL  196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85
> >  >>
> >  >> --- POOLS ---
> >  >> POOL   ID  PGS  STORED   OBJECTS  USED
>  %USED  MAX
> >  >> AVAIL
> >  >> device_health_metrics   11   19 KiB   12   56 KiB
>   0
> >  >>   61 TiB
> >  >> .rgw.root   2   32  2.6 KiB6  1.1 MiB
>   0
> >  >>   61 TiB
> >  >> default.rgw.log 3   32  168 KiB  210   13 MiB
>   0
> >  >>   61 TiB
> >  >> default.rgw.control 4   32  0 B8  0 B
>   0
> >  >>   61 TiB
> >  >> default.rgw.meta58  4.8 KiB   11  1.9 M

[ceph-users] Re: ceph df (octopus) shows USED is 7 times higher than STORED in erasure coded pool

2021-07-06 Thread Anthony D'Atri

> Oh, I just read your message again, and I see that I didn't answer your
> question. :D I admit I don't know how MAX AVAIL is calculated, and whether
> it takes things like imbalance into account (it might).

It does. It’s calculated relative to the most-full OSD in the pool, and the 
full_ratio is also applied to account for the space it effectively reserves.  
Significant imbalance in OSD fullness will result in this figure being smaller 
than one expects from the amount of raw capacity unused.  This is subtle and 
often misunderstood.


>>>Sum of POOLS/MAX AVAIL is about 840 GiB

Those numbers can’t be meaningfully summed, since most or all of those pools 
are sharing the same OSDs.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph df (octopus) shows USED is 7 times higher than STORED in erasure coded pool

2021-07-06 Thread Wladimir Mutel

I started my experimental 1-host/8-HDDs setup in 2018 with Luminous,
and I read 
https://ceph.io/community/new-luminous-erasure-coding-rbd-cephfs/ ,
which had interested me in using Bluestore and rewriteable EC pools for 
RBD data.
I have about 22 TiB or raw storage, and ceph df shows this :

--- RAW STORAGE ---
CLASSSIZEAVAILUSED  RAW USED  %RAW USED
hdd22 TiB  2.7 TiB  19 TiB19 TiB  87.78
TOTAL  22 TiB  2.7 TiB  19 TiB19 TiB  87.78

--- POOLS ---
POOL   ID  PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
jerasure21  1  256  9.0 TiB2.32M   13 TiB  97.06276 GiB
libvirt 2  128  1.5 TiB  413.60k  4.5 TiB  91.77140 GiB
rbd 3   32  798 KiB5  2.7 MiB  0138 GiB
iso 4   32  2.3 MiB   10  8.0 MiB  0138 GiB
device_health_metrics   51   31 MiB9   94 MiB   0.02138 GiB

If I add USED for libvirt and jerasure21 , I get 17.5 TiB, and 2.7 TiB 
is shown at RAW STORAGE/AVAIL
Sum of POOLS/MAX AVAIL is about 840 GiB, where are my other 2.7-0.840 
=~ 1.86 TiB ???
Or in different words, where are my (RAW STORAGE/RAW 
USED)-(SUM(POOLS/USED)) = 19-17.5 = 1.5 TiB ?

As it does not seem I would get any more hosts for this setup,
I am seriously thinking of bringing down this Ceph
and setting up instead a Btrfs storing qcow2 images served over iSCSI
which looks simpler to me for single-host situation.

Josh Baergen wrote:

Hey Wladimir,

I actually don't know where this is referenced in the docs, if anywhere. 
Googling around shows many people discovering this overhead the hard way on 
ceph-users.

I also don't know the rbd journaling mechanism in enough depth to comment on whether it could be causing this issue for you. Are you seeing a high 
allocated:stored ratio on your cluster?


Josh

On Sun, Jul 4, 2021 at 6:52 AM Wladimir Mutel mailto:m...@mwg.dp.ua>> wrote:

Dear Mr Baergen,

thanks a lot for your very concise explanation,
however I would like to learn more why default Bluestore alloc.size causes 
such a big storage overhead,
and where in the Ceph docs it is explained how and what to watch for to 
avoid hitting this phenomenon again and again.
I have a feeling this is what I get on my experimental Ceph setup with 
simplest JErasure 2+1 data pool.
Could it be caused by journaled RBD writes to EC data-pool ?

Josh Baergen wrote:
 > Hey Arkadiy,
 >
 > If the OSDs are on HDDs and were created with the default
 > bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus, then in
 > effect data will be allocated from the pool in 640KiB chunks (64KiB *
 > (k+m)). 5.36M objects taking up 501GiB is an average object size of 98KiB
 > which results in a ratio of 6.53:1 allocated:stored, which is pretty 
close
 > to the 7:1 observed.
 >
 > If my assumption about your configuration is correct, then the only way 
to
 > fix this is to adjust bluestore_min_alloc_size_hdd and recreate all your
 > OSDs, which will take a while...
 >
 > Josh
 >
 > On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev mailto:e...@ethaniel.com>> wrote:
 >
 >> The pool *default.rgw.buckets.data* has *501 GiB* stored, but USED shows
 >> *3.5
 >> TiB *(7 times higher!)*:*
 >>
 >> root@ceph-01:~# ceph df
 >> --- RAW STORAGE ---
 >> CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
 >> hdd    196 TiB  193 TiB  3.5 TiB   3.6 TiB       1.85
 >> TOTAL  196 TiB  193 TiB  3.5 TiB   3.6 TiB       1.85
 >>
 >> --- POOLS ---
 >> POOL                       ID  PGS  STORED   OBJECTS  USED     %USED  
MAX
 >> AVAIL
 >> device_health_metrics       1    1   19 KiB       12   56 KiB      0
 >>   61 TiB
 >> .rgw.root                   2   32  2.6 KiB        6  1.1 MiB      0
 >>   61 TiB
 >> default.rgw.log             3   32  168 KiB      210   13 MiB      0
 >>   61 TiB
 >> default.rgw.control         4   32      0 B        8      0 B      0
 >>   61 TiB
 >> default.rgw.meta            5    8  4.8 KiB       11  1.9 MiB      0
 >>   61 TiB
 >> default.rgw.buckets.index   6    8  1.6 GiB      211  4.7 GiB      0
 >>   61 TiB
 >>
 >> default.rgw.buckets.data   10  128  501 GiB    5.36M  3.5 TiB   1.90
 >> 110 TiB
 >>
 >> The *default.rgw.buckets.data* pool is using erasure coding:
 >>
 >> root@ceph-01:~# ceph osd erasure-code-profile get EC_RGW_HOST
 >> crush-device-class=hdd
 >> crush-failure-domain=host
 >> crush-root=default
 >> jerasure-per-chunk-alignment=false
 >> k=6
 >> m=4
 >> plugin=jerasure
 >> technique=reed_sol_van
 >> w=8
 >>
 >> If anyone could help explain why it's using up 7 times more space, it 
would
 >> help a lot. Versioning is disabled.

[ceph-users] Re: ceph df (octopus) shows USED is 7 times higher than STORED in erasure coded pool

2021-07-05 Thread Josh Baergen
Hey Wladimir,

I actually don't know where this is referenced in the docs, if anywhere.
Googling around shows many people discovering this overhead the hard way on
ceph-users.

I also don't know the rbd journaling mechanism in enough depth to comment
on whether it could be causing this issue for you. Are you seeing a high
allocated:stored ratio on your cluster?

Josh

On Sun, Jul 4, 2021 at 6:52 AM Wladimir Mutel  wrote:

> Dear Mr Baergen,
>
> thanks a lot for your very concise explanation,
> however I would like to learn more why default Bluestore alloc.size causes
> such a big storage overhead,
> and where in the Ceph docs it is explained how and what to watch for to
> avoid hitting this phenomenon again and again.
> I have a feeling this is what I get on my experimental Ceph setup with
> simplest JErasure 2+1 data pool.
> Could it be caused by journaled RBD writes to EC data-pool ?
>
> Josh Baergen wrote:
> > Hey Arkadiy,
> >
> > If the OSDs are on HDDs and were created with the default
> > bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus, then in
> > effect data will be allocated from the pool in 640KiB chunks (64KiB *
> > (k+m)). 5.36M objects taking up 501GiB is an average object size of 98KiB
> > which results in a ratio of 6.53:1 allocated:stored, which is pretty
> close
> > to the 7:1 observed.
> >
> > If my assumption about your configuration is correct, then the only way
> to
> > fix this is to adjust bluestore_min_alloc_size_hdd and recreate all your
> > OSDs, which will take a while...
> >
> > Josh
> >
> > On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev  wrote:
> >
> >> The pool *default.rgw.buckets.data* has *501 GiB* stored, but USED shows
> >> *3.5
> >> TiB *(7 times higher!)*:*
> >>
> >> root@ceph-01:~# ceph df
> >> --- RAW STORAGE ---
> >> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> >> hdd196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85
> >> TOTAL  196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85
> >>
> >> --- POOLS ---
> >> POOL   ID  PGS  STORED   OBJECTS  USED %USED
> MAX
> >> AVAIL
> >> device_health_metrics   11   19 KiB   12   56 KiB  0
> >>   61 TiB
> >> .rgw.root   2   32  2.6 KiB6  1.1 MiB  0
> >>   61 TiB
> >> default.rgw.log 3   32  168 KiB  210   13 MiB  0
> >>   61 TiB
> >> default.rgw.control 4   32  0 B8  0 B  0
> >>   61 TiB
> >> default.rgw.meta58  4.8 KiB   11  1.9 MiB  0
> >>   61 TiB
> >> default.rgw.buckets.index   68  1.6 GiB  211  4.7 GiB  0
> >>   61 TiB
> >>
> >> default.rgw.buckets.data   10  128  501 GiB5.36M  3.5 TiB   1.90
> >> 110 TiB
> >>
> >> The *default.rgw.buckets.data* pool is using erasure coding:
> >>
> >> root@ceph-01:~# ceph osd erasure-code-profile get EC_RGW_HOST
> >> crush-device-class=hdd
> >> crush-failure-domain=host
> >> crush-root=default
> >> jerasure-per-chunk-alignment=false
> >> k=6
> >> m=4
> >> plugin=jerasure
> >> technique=reed_sol_van
> >> w=8
> >>
> >> If anyone could help explain why it's using up 7 times more space, it
> would
> >> help a lot. Versioning is disabled. ceph version 15.2.13 (octopus
> stable).
> >>
> >> Sincerely,
> >> Ark.
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph df (octopus) shows USED is 7 times higher than STORED in erasure coded pool

2021-07-04 Thread Wladimir Mutel

Dear Mr Baergen,

thanks a lot for your very concise explanation,
however I would like to learn more why default Bluestore alloc.size causes such 
a big storage overhead,
and where in the Ceph docs it is explained how and what to watch for to avoid 
hitting this phenomenon again and again.
I have a feeling this is what I get on my experimental Ceph setup with simplest 
JErasure 2+1 data pool.
Could it be caused by journaled RBD writes to EC data-pool ?

Josh Baergen wrote:

Hey Arkadiy,

If the OSDs are on HDDs and were created with the default
bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus, then in
effect data will be allocated from the pool in 640KiB chunks (64KiB *
(k+m)). 5.36M objects taking up 501GiB is an average object size of 98KiB
which results in a ratio of 6.53:1 allocated:stored, which is pretty close
to the 7:1 observed.

If my assumption about your configuration is correct, then the only way to
fix this is to adjust bluestore_min_alloc_size_hdd and recreate all your
OSDs, which will take a while...

Josh

On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev  wrote:


The pool *default.rgw.buckets.data* has *501 GiB* stored, but USED shows
*3.5
TiB *(7 times higher!)*:*

root@ceph-01:~# ceph df
--- RAW STORAGE ---
CLASS  SIZE AVAILUSED RAW USED  %RAW USED
hdd196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85
TOTAL  196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85

--- POOLS ---
POOL   ID  PGS  STORED   OBJECTS  USED %USED  MAX
AVAIL
device_health_metrics   11   19 KiB   12   56 KiB  0
  61 TiB
.rgw.root   2   32  2.6 KiB6  1.1 MiB  0
  61 TiB
default.rgw.log 3   32  168 KiB  210   13 MiB  0
  61 TiB
default.rgw.control 4   32  0 B8  0 B  0
  61 TiB
default.rgw.meta58  4.8 KiB   11  1.9 MiB  0
  61 TiB
default.rgw.buckets.index   68  1.6 GiB  211  4.7 GiB  0
  61 TiB

default.rgw.buckets.data   10  128  501 GiB5.36M  3.5 TiB   1.90
110 TiB

The *default.rgw.buckets.data* pool is using erasure coding:

root@ceph-01:~# ceph osd erasure-code-profile get EC_RGW_HOST
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=6
m=4
plugin=jerasure
technique=reed_sol_van
w=8

If anyone could help explain why it's using up 7 times more space, it would
help a lot. Versioning is disabled. ceph version 15.2.13 (octopus stable).

Sincerely,
Ark.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph df (octopus) shows USED is 7 times higher than STORED in erasure coded pool

2021-06-29 Thread Josh Baergen
Hey Arkadiy,

If the OSDs are on HDDs and were created with the default
bluestore_min_alloc_size_hdd, which is still 64KiB in Octopus, then in
effect data will be allocated from the pool in 640KiB chunks (64KiB *
(k+m)). 5.36M objects taking up 501GiB is an average object size of 98KiB
which results in a ratio of 6.53:1 allocated:stored, which is pretty close
to the 7:1 observed.

If my assumption about your configuration is correct, then the only way to
fix this is to adjust bluestore_min_alloc_size_hdd and recreate all your
OSDs, which will take a while...

Josh

On Tue, Jun 29, 2021 at 3:07 PM Arkadiy Kulev  wrote:

> The pool *default.rgw.buckets.data* has *501 GiB* stored, but USED shows
> *3.5
> TiB *(7 times higher!)*:*
>
> root@ceph-01:~# ceph df
> --- RAW STORAGE ---
> CLASS  SIZE AVAILUSED RAW USED  %RAW USED
> hdd196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85
> TOTAL  196 TiB  193 TiB  3.5 TiB   3.6 TiB   1.85
>
> --- POOLS ---
> POOL   ID  PGS  STORED   OBJECTS  USED %USED  MAX
> AVAIL
> device_health_metrics   11   19 KiB   12   56 KiB  0
>  61 TiB
> .rgw.root   2   32  2.6 KiB6  1.1 MiB  0
>  61 TiB
> default.rgw.log 3   32  168 KiB  210   13 MiB  0
>  61 TiB
> default.rgw.control 4   32  0 B8  0 B  0
>  61 TiB
> default.rgw.meta58  4.8 KiB   11  1.9 MiB  0
>  61 TiB
> default.rgw.buckets.index   68  1.6 GiB  211  4.7 GiB  0
>  61 TiB
>
> default.rgw.buckets.data   10  128  501 GiB5.36M  3.5 TiB   1.90
> 110 TiB
>
> The *default.rgw.buckets.data* pool is using erasure coding:
>
> root@ceph-01:~# ceph osd erasure-code-profile get EC_RGW_HOST
> crush-device-class=hdd
> crush-failure-domain=host
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=6
> m=4
> plugin=jerasure
> technique=reed_sol_van
> w=8
>
> If anyone could help explain why it's using up 7 times more space, it would
> help a lot. Versioning is disabled. ceph version 15.2.13 (octopus stable).
>
> Sincerely,
> Ark.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph df (octopus) shows USED is 7 times higher than STORED in erasure coded pool

2021-06-29 Thread Arkadiy Kulev
Dear Josh, Thank you! I will be upgrading to Pacific and lowering
bluestore_min_alloc_size_hdd down to 4K.
Will report back with the results.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io