[ceph-users] Re: PG number per OSD

Anthony D'Atri Sat, 05 Sep 2020 11:01:09 -0700

One factor is RAM usage, that was IIRC the motivation for the lowering of the 
recommendation of the ratio from 200 to 100.  Memory needs also increase during 
recovery and backfill.

When calculating, be sure to consider repllicas.

ratio = (pgp_num x replication) / num_osds

As HDDs grow the interface though isn’t becoming faster (with SATA at least), 
and there are only so many IOPS and MB/s that you’re going to get out of one no 
matter how you slice it.  Everything always depends on your use-case and 
workload, but I suspect that often the bottleneck is the drive, not PG or OSD 
serialization.

For example, do you prize IOPS more, latency, or MB/s?  If you don’t care about 
latency, then you can drive your HDDs harder and get more MB/s throughput out 
of them, though your average latency might climb to 100ms.  Which eg. RBD VM 
clients probably wouldn’t be too happy about, but which an object service 
*might* tolerate.

Basically in the absence of more info, I would personally suggest aiming at the 
150-200 average range, with pgp_num a power of 2.  If you aim a bit high, the 
ratio will come down a bit when you add nodes/OSDs to your cluster to gain 
capacity.  Be sure to balance usage and watch your mon_max_pg_per_osd setting — 
allowing some headroom for natural variation and for when components fail.

YMMV.  

— aad

> On Sep 5, 2020, at 10:34 AM, huxia...@horebdata.cn wrote:
> 
> Dear Ceph folks,
> 
> As the capacity of one HDD (OSD) is growing bigger and bigger, e.g. from 6TB 
> up to 18TB or even more, should the number of PG per OSD increase as well, 
> e.g. for 200 to 800. As far as i know, the capacity of each PG should be set 
> smaller for performance reasons due to the existence of PG locks, thus shall 
> i set the number of PGs per OSD to 1000 or even 2000?  what is the actual 
> reason for not setting the number of PGs per OSD? Is there any practical 
> limations on the number of PGs?
> 
> thanks a lot,
> 
> Samuel 
> 
> 
> 
> 
> huxia...@horebdata.cn
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: PG number per OSD

Reply via email to