Re: [ceph-users] "too many PGs per OSD" in Hammer

Chris Armstrong Wed, 06 May 2015 15:39:53 -0700

Here's a little more information on our use case:
https://github.com/deis/deis/issues/3638


On Wed, May 6, 2015 at 2:53 PM, Chris Armstrong <[email protected]>
wrote:

> Thanks for the feedback. That language is confusing to me, then, since the
> first paragraph seems to suggest using a pg_num of 128 in cases where we
> have less than 5 OSDs, as we do here.
>
> The warning below that is: "As the number of OSDs increases, chosing the
> right value for pg_num becomes more important because it has a significant
> influence on the behavior of the cluster as well as the durability of the
> data when something goes wrong (i.e. the probability that a catastrophic
> event leads to data loss).", which suggests that this could be an issue
> with more OSDs, which doesn't apply here.
>
> Do we know if this warning is calculated based on the resources of the
> host? If I try with larger machines, will this warning change?
>
> On Wed, May 6, 2015 at 2:41 PM, <[email protected]> wrote:
>
>> Hi,
>>
>> You've too many PG for too few OSD
>> As the docs you linked said:
>>
>> When using multiple data pools for storing objects, you need to ensure
>> that you balance the number of placement groups per pool with the number
>> of placement groups per OSD so that you arrive at a reasonable total
>> number of placement groups that provides reasonably low variance per OSD
>> without taxing system resources or making the peering process too slow.
>>
>> For instance a cluster of 10 pools each with 512 placement groups on ten
>> OSDs is a total of 5,120 placement groups spread over ten OSDs, that is
>> 512 placement groups per OSD. That does not use too many resources.
>> However, if 1,000 pools were created with 512 placement groups each, the
>> OSDs will handle ~50,000 placement groups each and it would require
>> significantly more resources and time for peering.
>>
>> So, remove useless pools or add OSDs
>>
>> On 06/05/2015 23:32, Chris Armstrong wrote:
>> > Hi folks,
>> >
>> > Calling on the collective Ceph knowledge here. Since upgrading to
>> > Hammer, we're now seeing:
>> >
>> >      health HEALTH_WARN
>> >             too many PGs per OSD (1536 > max 300)
>> >
>> > We have 3 OSDs, so we have used the pg_num of 128 based on the
>> > suggestion here:
>> > http://ceph.com/docs/master/rados/operations/placement-groups/
>> >
>> > We're also using the 12 default pools:
>> > root@ca-deis-1:/# ceph osd lspools
>> > 0 rbd,1 data,2 metadata,3 .rgw.root,4 .rgw.control,5 .rgw,6 .rgw.gc,7
>> > .users.uid,8 .users,9 .rgw.buckets.index,10 .rgw.buckets,11
>> > .rgw.buckets.extra,
>> >
>> >
>> > Here's the output of ceph osd dump:
>> >
>> > root@ca-deis-1:/# ceph osd dump
>> > epoch 46
>> > fsid 7bd27c76-f5f8-4eea-819b-379177929653
>> > created 2015-05-06 20:40:01.658764
>> > modified 2015-05-06 21:05:18.391730
>> > flags
>> > pool 0 'rbd' replicated size 3 min_size 1 crush_ruleset 0 object_hash
>> > rjenkins pg_num 128 pgp_num 128 last_change 18 flags hashpspool
>> > stripe_width 0
>> > pool 1 'data' replicated size 3 min_size 1 crush_ruleset 0 object_hash
>> > rjenkins pg_num 128 pgp_num 128 last_change 11 flags hashpspool
>> > crash_replay_interval 45 stripe_width 0
>> > pool 2 'metadata' replicated size 3 min_size 1 crush_ruleset 0
>> > object_hash rjenkins pg_num 128 pgp_num 128 last_change 10 flags
>> > hashpspool stripe_width 0
>> > pool 3 '.rgw.root' replicated size 3 min_size 1 crush_ruleset 0
>> > object_hash rjenkins pg_num 128 pgp_num 128 last_change 20 flags
>> > hashpspool stripe_width 0
>> > pool 4 '.rgw.control' replicated size 3 min_size 1 crush_ruleset 0
>> > object_hash rjenkins pg_num 128 pgp_num 128 last_change 22 flags
>> > hashpspool stripe_width 0
>> > pool 5 '.rgw' replicated size 3 min_size 1 crush_ruleset 0 object_hash
>> > rjenkins pg_num 128 pgp_num 128 last_change 24 flags hashpspool
>> > stripe_width 0
>> > pool 6 '.rgw.gc' replicated size 3 min_size 1 crush_ruleset 0
>> > object_hash rjenkins pg_num 128 pgp_num 128 last_change 25 flags
>> > hashpspool stripe_width 0
>> > pool 7 '.users.uid' replicated size 3 min_size 1 crush_ruleset 0
>> > object_hash rjenkins pg_num 128 pgp_num 128 last_change 26 flags
>> > hashpspool stripe_width 0
>> > pool 8 '.users' replicated size 3 min_size 1 crush_ruleset 0 object_hash
>> > rjenkins pg_num 128 pgp_num 128 last_change 28 flags hashpspool
>> > stripe_width 0
>> > pool 9 '.rgw.buckets.index' replicated size 3 min_size 1 crush_ruleset 0
>> > object_hash rjenkins pg_num 128 pgp_num 128 last_change 30 flags
>> > hashpspool stripe_width 0
>> > pool 10 '.rgw.buckets' replicated size 3 min_size 1 crush_ruleset 0
>> > object_hash rjenkins pg_num 128 pgp_num 128 last_change 35 flags
>> > hashpspool stripe_width 0
>> > pool 11 '.rgw.buckets.extra' replicated size 3 min_size 1 crush_ruleset
>> > 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 40 flags
>> > hashpspool stripe_width 0
>> > max_osd 3
>> > osd.0 up   in  weight 1 up_from 4 up_thru 45 down_at 0
>> > last_clean_interval [0,0) 10.132.162.16:6800/1
>> > <http://10.132.162.16:6800/1> 10.132.162.16:6801/1
>> > <http://10.132.162.16:6801/1> 10.132.162.16:6802/1
>> > <http://10.132.162.16:6802/1> 10.132.162.16:6803/1
>> > <http://10.132.162.16:6803/1> exists,up
>> d996b242-7fce-475f-a889-fa14038de180
>> > osd.1 up   in  weight 1 up_from 7 up_thru 45 down_at 0
>> > last_clean_interval [0,0) 10.132.253.121:6800/1
>> > <http://10.132.253.121:6800/1> 10.132.253.121:6801/1
>> > <http://10.132.253.121:6801/1> 10.132.253.121:6802/1
>> > <http://10.132.253.121:6802/1> 10.132.253.121:6803/1
>> > <http://10.132.253.121:6803/1> exists,up
>> > 8ef7080d-ca37-4003-ae54-b76ddd13f752
>> > osd.2 up   in  weight 1 up_from 45 up_thru 45 down_at 43
>> > last_clean_interval [38,44) 10.132.253.118:6801/1
>> > <http://10.132.253.118:6801/1> 10.132.253.118:6805/1000001
>> > <http://10.132.253.118:6805/1000001> 10.132.253.118:6806/1000001
>> > <http://10.132.253.118:6806/1000001> 10.132.253.118:6807/1000001
>> > <http://10.132.253.118:6807/1000001> exists,up
>> > 7b30f8aa-732b-4dca-bfbd-2dca9fb3c5ec
>> >
>> > Note that we have 3 replicas of our data (size 3) so that we can operate
>> > with just one host up.
>> >
>> > We've seen performance issues before (especially during platform start),
>> > which has me thinking - are we using too many placement groups given the
>> > small number of OSDs and the fact that we're forcing each OSD to have a
>> > full set of the data with size=3? Maybe the performance issues are to be
>> > expected since we're pushing around so many PGs on startup.
>> >
>> > This logic has not changed since our use of firefly and giant, so I'm
>> > not sure what changed. Some guidance is appreciated.
>> >
>> > Thanks!
>> >
>> > Chris
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > [email protected]
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> *Chris Armstrong* | Deis Team Lead | *Engine Yard* | t: @carmstrong_afk
> <https://twitter.com/carmstrong_afk> | gh: carmstrong
> <https://github.com/carmstrong>
>
> Deis: github.com/deis/deis | docs.deis.io | #deis
> <https://botbot.me/freenode/deis/>
>
> Deis is now part of Engine Yard! http://deis.io/deis-meet-engine-yard/
>



-- 
*Chris Armstrong* | Deis Team Lead | *Engine Yard* | t: @carmstrong_afk
<https://twitter.com/carmstrong_afk> | gh: carmstrong
<https://github.com/carmstrong>

Deis: github.com/deis/deis | docs.deis.io | #deis
<https://botbot.me/freenode/deis/>

Deis is now part of Engine Yard! http://deis.io/deis-meet-engine-yard/

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] "too many PGs per OSD" in Hammer

Reply via email to