Last summer we increased an EC 8+3 pool from 1024 to 2048 PGs on our ~1500 OSD (Kraken) cluster. This pool contained ~2 petabytes of data at the time.
We did a fair amount of testing on a throwaway pool on the same cluster beforehand, starting with small increases (16/32/64). The main observation was that the act of splitting the PGs causes issues, not the resulting data movement, assuming your backfills are tuned to a level where they don’t affect client IO. As the PG splitting and peering (pg_num and pgp_num) increases are a) non reversible and b) the resulting operations happen instantaneously, overly large increases can end up with an unhappy mess of excessive storage node load, OSDs flapping and blocked requests. We ended up doing increases of 128 PGs at a time. I’d hazard a guess that you will be fine going straight to 512 PGs, but the only way to be sure of the correct increase size for your cluster is to test it. Cheers Tom From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Karun Josy Sent: 02 January 2018 16:23 To: Hans van den Bogert <hansbog...@gmail.com> Cc: ceph-users <ceph-users@lists.ceph.com> Subject: Re: [ceph-users] Increasing PG number https://access.redhat.com/solutions/2457321 It says it is a very intensive process and can affect cluster performance. Our Version is Luminous 12.2.2 And we are using erasure coding profile for a pool 'ecpool' with k=5 and m=3 Current PG number is 256 and it has about 20 TB of data. Should I increase it gradually? Or set pg as 512 in one step ? Karun Josy On Tue, Jan 2, 2018 at 9:26 PM, Hans van den Bogert <hansbog...@gmail.com<mailto:hansbog...@gmail.com>> wrote: Please refer to standard documentation as much as possible, http://docs.ceph.com/docs/jewel/rados/operations/placement-groups/#set-the-number-of-placement-groups Han’s is also incomplete, since you also need to change the ‘pgp_num’ as well. Regards, Hans On Jan 2, 2018, at 4:41 PM, Vladimir Prokofev <v...@prokofev.me<mailto:v...@prokofev.me>> wrote: Increased number of PGs in multiple pools in a production cluster on 12.2.2 recently - zero issues. CEPH claims that increasing pg_num and pgp_num are safe operations, which are essential for it's ability to scale, and this sounds pretty reasonable to me. [1] [1] https://www.sebastien-han.fr/blog/2013/03/12/ceph-change-pg-number-on-the-fly/ 2018-01-02 18:21 GMT+03:00 Karun Josy <karunjo...@gmail.com<mailto:karunjo...@gmail.com>>: Hi, Initial PG count was not properly planned while setting up the cluster, so now there are only less than 50 PGs per OSDs. What are the best practises to increase PG number of a pool ? We have replicated pools as well as EC pools. Or is it better to create a new pool with higher PG numbers? Karun _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com