[ceph-users] Re: Where has my capacity gone?

2021-01-26 Thread George Yil
Sorry for replying late :(. And thanks for the tips.

This is a fresh cluster. And I didn’t think data distribution would be a 
problem. Is this normal?

Below is the ceph osd df output. The related pool is hdd only 
(prod.rgw.buckets.data). I guess there is variance but I couldn’t get the 
reason. Is it because of PG number which I get help from pg-calculator 
? Or is this expected ceph behaviour?

#ceph osd df
https://pastebin.ubuntu.com/p/ZmQZsGYpr7/ 


I am also sharing related cluster information. Any suggestion would be 
appreciated.
#ceph df
https://pastebin.ubuntu.com/p/sXpf99zhnV/ 


#ceph detail df
https://pastebin.ubuntu.com/p/dwvwBnnBmv/ 


#ceph osd pool ls detail
https://pastebin.ubuntu.com/p/c2KQD5CGMV/ 


#crush rules
https://pastebin.ubuntu.com/p/X6WsZhV3Zz/ 


Thanks.


> On 26 Jan 2021, at 11:18, Anthony D'Atri  wrote:
> 
> ceph osd df | sort -nk8
> 
>> On Jan 25, 2021, at 11:22 PM, George Yil  wrote:
>> 
>> Hi,
>> 
>> I have a ceph nautilus (14.2.9) cluster with 10 nodes. Each node has
>> 19x16TB disks attached.
>> 
>> I created radosgw pools. secondaryzone.rgw.buckets.data pool is configured
>> as EC 8+2 (jerasure).
>> ceph df shows 2.1PiB MAX AVAIL space.
>> 
>> Then I configured radosgw as a secondary zone and 100TiB of S3 data is
>> replicated.
>> 
>> But weirdly enough ceph df shows 1.8PiB MAX AVAIL for the same pool. But
>> there is only 100TiB of written data. ceph df also confirms it. I can not
>> figure out where 200TiB capacity is gone.
>> 
>> Would someone please tell me what I am missing?
>> 
>> Thanks.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Where has my capacity gone?

2021-01-26 Thread Josh Baergen
> I created radosgw pools. secondaryzone.rgw.buckets.data pool is
configured as EC 8+2 (jerasure).

Did you override the default bluestore_min_alloc_size_hdd (64k in that
version IIRC) when creating your hdd OSDs? If not, all of the small objects
produced by that EC configuration will be leading to significant on-disk
allocation overhead.

Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Where has my capacity gone?

2021-01-26 Thread George Yil
I did not. Honestly I was not aware of such a thing. Thanks for the 
notification. And I hope this is not bad news. 

May I ask if it can be dynamically changed and any disadvantages should be 
expected?

> On 27 Jan 2021, at 01:33, Josh Baergen  wrote:
> 
> > I created radosgw pools. secondaryzone.rgw.buckets.data pool is configured 
> > as EC 8+2 (jerasure).
> 
> Did you override the default bluestore_min_alloc_size_hdd (64k in that 
> version IIRC) when creating your hdd OSDs? If not, all of the small objects 
> produced by that EC configuration will be leading to significant on-disk 
> allocation overhead.
> 
> Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Where has my capacity gone?

2021-01-27 Thread George Yil
Thank you. This helps a lot.

> Josh Baergen  şunları yazdı (27 Oca 2021 17:08):
> 
> On Wed, Jan 27, 2021 at 12:24 AM George Yil  wrote:
>> May I ask if it can be dynamically changed and any disadvantages should be 
>> expected?
> 
> Unless there's some magic I'm unaware of, there is no way to
> dynamically change this. Each OSD must be recreated with the new
> min_alloc_size setting. In production systems this can be quite the
> chore, since the safest way to accomplish this is to drain the OSD
> (set it 'out', use CRUSH map changes, or use upmaps), recreate it, and
> then repopulate it. With automation this can run in the background.
> Given how much room you have currently you may be able to do this
> host-at-a-time by storing a host's data on the other hosts in a given
> rack (though I don't remember what your CRUSH tree looks like so maybe
> you can't do this and maintain host independence).
> 
> The downside is potentially more tracking metadata at the OSD level,
> though I understand that Nautilus has made improvements here. I'm not
> up to speed on the latest state in this area, though.
> 
> Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Where has my capacity gone?

2021-01-27 Thread Josh Baergen
On Wed, Jan 27, 2021 at 12:24 AM George Yil  wrote:
> May I ask if it can be dynamically changed and any disadvantages should be 
> expected?

Unless there's some magic I'm unaware of, there is no way to
dynamically change this. Each OSD must be recreated with the new
min_alloc_size setting. In production systems this can be quite the
chore, since the safest way to accomplish this is to drain the OSD
(set it 'out', use CRUSH map changes, or use upmaps), recreate it, and
then repopulate it. With automation this can run in the background.
Given how much room you have currently you may be able to do this
host-at-a-time by storing a host's data on the other hosts in a given
rack (though I don't remember what your CRUSH tree looks like so maybe
you can't do this and maintain host independence).

The downside is potentially more tracking metadata at the OSD level,
though I understand that Nautilus has made improvements here. I'm not
up to speed on the latest state in this area, though.

Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Where has my capacity gone?

2021-01-27 Thread George Yil
May I ask if enabling pool compression helps for the future space amplification?

> George Yil  şunları yazdı (27 Oca 2021 18:57):
> 
> Thank you. This helps a lot.
> 
>> Josh Baergen  şunları yazdı (27 Oca 2021 17:08):
>> 
>> On Wed, Jan 27, 2021 at 12:24 AM George Yil  wrote:
>>> May I ask if it can be dynamically changed and any disadvantages should be 
>>> expected?
>> 
>> Unless there's some magic I'm unaware of, there is no way to
>> dynamically change this. Each OSD must be recreated with the new
>> min_alloc_size setting. In production systems this can be quite the
>> chore, since the safest way to accomplish this is to drain the OSD
>> (set it 'out', use CRUSH map changes, or use upmaps), recreate it, and
>> then repopulate it. With automation this can run in the background.
>> Given how much room you have currently you may be able to do this
>> host-at-a-time by storing a host's data on the other hosts in a given
>> rack (though I don't remember what your CRUSH tree looks like so maybe
>> you can't do this and maintain host independence).
>> 
>> The downside is potentially more tracking metadata at the OSD level,
>> though I understand that Nautilus has made improvements here. I'm not
>> up to speed on the latest state in this area, though.
>> 
>> Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Where has my capacity gone?

2021-01-28 Thread George Yil
Hi Marc,

Thanks for participating. At first I thought this is an incorrect report and 
maybe I need to upgrade to for a bugfix.
But I couldn’t find a such a report and I asked here.

When people shared experiences it appears there may be two causes. Unbalanced 
OSDs or Storage Amplification. 
As far as I understand, this is most likely to be storage amplification. 
Unbalancing seems to less relevant since this is a fresh cluster. Or I might be 
misinterpreting ceph osd df. https://pastebin.ubuntu.com/p/ZmQZsGYpr7/ 


So I am trying to figure out the best way to change 
bluestore_min_alloc_size_hdd. And also I think pool compression can be a quick 
solution for the future data writes, but I am not %100 sure. Any idea is more 
than welcome.


> On 28 Jan 2021, at 12:29, Marc Roos  wrote:
> 
> Hi George,
> 
> Sorry for asking maybe I skipped an email. But what eventually caused the 
> 'incorrect' report on available storage.
> 
> 
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Where has my capacity gone?

2021-01-28 Thread Josh Baergen
Hi George,

> May I ask if enabling pool compression helps for the future space 
> amplification?

If the amplification is indeed due to min_alloc_size, then I don't
think that compression will help. My understanding is that compression
is applied post-EC (and thus probably won't even activate due to the
small chunks), and that the compressed bits will still be stored on
disk in the same way as before (min_alloc_size still applies). More
info here: https://www.suse.com/support/kb/doc/?id=19629

It's possible, though, that turning on compression and tuning its
settings could reduce the overall number of blocks allocated, which
would compensate slightly for the amplification. To confirm that you'd
have to analyze the object sizes of your data set. There are also
pathological cases where perhaps most of your EC chunks are slightly
over 64K and by forcing them to compress (they won't by default) you
actually cut allocated blocks in half. Again, that would take analysis
to determine.

Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io