Re: [ceph-users] Re-weight Entire Cluster?
OIC, thanks for providing the tree output. From what you wrote originally it seemed plausible that you were mixing up the columns, which is not an uncommon thing to do. If all of your OSD’s are the same size, and have a CRUSH weight of 1., then you have just the usual OSD fullness distribution problem. If you have other OSD’s in the cluster that are the same size as these but have different CRUSH weights, then you do have a problem. Is that the case? Feel free to privately email me your entire ceph osd tree output if you like, to avoid spamming the list. — aad > Hi Anthony, > > When the OSDs were added it appears they were added with a crush weight of 1 > so I believe we need to change the weighting as we are getting a lot of very > full OSDs. > > -21 20.0 host somehost > 216 1.0 osd.216 up 1.0 1.0 > 217 1.0 osd.217 up 1.0 1.0 > 218 1.0 osd.218 up 1.0 1.0 > 219 1.0 osd.219 up 1.0 1.0 > 220 1.0 osd.220 up 1.0 1.0 > 221 1.0 osd.221 up 1.0 1.0 > 222 1.0 osd.222 up 1.0 1.0 > 223 1.0 osd.223 up 1.0 1.0 > > -Original Message- > From: Anthony D'Atri > Date: Tuesday, May 30, 2017 at 1:10 PM > To: ceph-users > Cc: Cave Mike > Subject: Re: [ceph-users] Re-weight Entire Cluster? > > > >> It appears the current best practice is to weight each OSD according to it?s >> size (3.64 for 4TB drive, 7.45 for 8TB drive, etc). > > OSD’s are created with those sorts of CRUSH weights by default, yes. Which > is convenient, but it’s import to know that those weights are arbitrary, and > what really matters is how the weights of each OSD / host / rack compares to > its siblings. They are relative weights, not absolute capacities. > >> As it turns out, it was not configured this way at all; all of the OSDs are >> weighted at 1. > > Are you perhaps confusing CRUSH weights with override weights? In the below > example each OSD has a CRUSH weight of 3.48169, but the override reweight is > 1.000. The override ranges from 0 to 1. It is admittedly confusing to have > two different things called weight. Ceph’s reweight-by-utilization eg. acts > by adjusting the override reweight and not touching the CRUSH weights. > > ID WEIGHT TYPE NAMEUP/DOWN REWEIGHT > PRIMARY-AFFINITY > -44 83.56055 host somehostname > 9363.48169 osd.936 up 1.0 > 1.0 > 9373.48169 osd.937 up 1.0 > 1.0 > 9383.48169 osd.938 up 1.0 > 1.0 > 9393.48169 osd.939 up 1.0 > 1.0 > 9403.48169 osd.940 up 1.0 > 1.0 > 9413.48169 osd.941 up 1.0 > 1.0 > > If you see something similar, from “ceph osd tree”, then chances are that > there’s no point in changing anything since with CRUSH weights, all that > matters is how they compare across OSD’s/racks/hosts/etc.. So you could > double all of them just for grins, and nothing in how the cluster operates > would change. > > — Anthony > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Re-weight Entire Cluster?
Hi Anthony, When the OSDs were added it appears they were added with a crush weight of 1 so I believe we need to change the weighting as we are getting a lot of very full OSDs. -21 20.0 host somehost 216 1.0 osd.216 up 1.0 1.0 217 1.0 osd.217 up 1.0 1.0 218 1.0 osd.218 up 1.0 1.0 219 1.0 osd.219 up 1.0 1.0 220 1.0 osd.220 up 1.0 1.0 221 1.0 osd.221 up 1.0 1.0 222 1.0 osd.222 up 1.0 1.0 223 1.0 osd.223 up 1.0 1.0 -Original Message- From: Anthony D'Atri Date: Tuesday, May 30, 2017 at 1:10 PM To: ceph-users Cc: Cave Mike Subject: Re: [ceph-users] Re-weight Entire Cluster? > It appears the current best practice is to weight each OSD according to it?s > size (3.64 for 4TB drive, 7.45 for 8TB drive, etc). OSD’s are created with those sorts of CRUSH weights by default, yes. Which is convenient, but it’s import to know that those weights are arbitrary, and what really matters is how the weights of each OSD / host / rack compares to its siblings. They are relative weights, not absolute capacities. > As it turns out, it was not configured this way at all; all of the OSDs are > weighted at 1. Are you perhaps confusing CRUSH weights with override weights? In the below example each OSD has a CRUSH weight of 3.48169, but the override reweight is 1.000. The override ranges from 0 to 1. It is admittedly confusing to have two different things called weight. Ceph’s reweight-by-utilization eg. acts by adjusting the override reweight and not touching the CRUSH weights. ID WEIGHT TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY -44 83.56055 host somehostname 9363.48169 osd.936 up 1.0 1.0 9373.48169 osd.937 up 1.0 1.0 9383.48169 osd.938 up 1.0 1.0 9393.48169 osd.939 up 1.0 1.0 9403.48169 osd.940 up 1.0 1.0 9413.48169 osd.941 up 1.0 1.0 If you see something similar, from “ceph osd tree”, then chances are that there’s no point in changing anything since with CRUSH weights, all that matters is how they compare across OSD’s/racks/hosts/etc.. So you could double all of them just for grins, and nothing in how the cluster operates would change. — Anthony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Re-weight Entire Cluster?
> It appears the current best practice is to weight each OSD according to it?s > size (3.64 for 4TB drive, 7.45 for 8TB drive, etc). OSD’s are created with those sorts of CRUSH weights by default, yes. Which is convenient, but it’s import to know that those weights are arbitrary, and what really matters is how the weights of each OSD / host / rack compares to its siblings. They are relative weights, not absolute capacities. > As it turns out, it was not configured this way at all; all of the OSDs are > weighted at 1. Are you perhaps confusing CRUSH weights with override weights? In the below example each OSD has a CRUSH weight of 3.48169, but the override reweight is 1.000. The override ranges from 0 to 1. It is admittedly confusing to have two different things called weight. Ceph’s reweight-by-utilization eg. acts by adjusting the override reweight and not touching the CRUSH weights. ID WEIGHT TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY -44 83.56055 host somehostname 9363.48169 osd.936 up 1.0 1.0 9373.48169 osd.937 up 1.0 1.0 9383.48169 osd.938 up 1.0 1.0 9393.48169 osd.939 up 1.0 1.0 9403.48169 osd.940 up 1.0 1.0 9413.48169 osd.941 up 1.0 1.0 If you see something similar, from “ceph osd tree”, then chances are that there’s no point in changing anything since with CRUSH weights, all that matters is how they compare across OSD’s/racks/hosts/etc.. So you could double all of them just for grins, and nothing in how the cluster operates would change. — Anthony ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Re-weight Entire Cluster?
Hi Mike, On 30.05.2017 01:49, Mike Cave wrote: > > Greetings All, > > > > I recently started working with our ceph cluster here and have been > reading about weighting. > > > > It appears the current best practice is to weight each OSD according > to it’s size (3.64 for 4TB drive, 7.45 for 8TB drive, etc). > > > > As it turns out, it was not configured this way at all; all of the > OSDs are weighted at 1. > > > > So my questions are: > > > > Can we re-weight the entire cluster to 3.64 and then re-weight the 8TB > drives afterwards at a slow rate which won’t impact performance? > > If we do an entire re-weight will we have any issues? > I would set osd_max_backfills + osd_recovery_max_active to 1 (with injectargs) before start the reweight to minimize the impact for running clients. After set all to 3.64 you can raise the weight for the 8TB-drives one by one. Depends on your cluster/OSDs, it's perhaps an good idea to adjust the primary affinity for the 8-TB drives during reweight?! Otherwise you got more reads from the (slower) 8TB-drives. > Would it be better to just reweight the 8TB drives to 2 gradually? > I would go for 3.64 - than you have the right settings if you init further OSDs with ceph-deploy. Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Re-weight Entire Cluster?
Greetings All, I recently started working with our ceph cluster here and have been reading about weighting. It appears the current best practice is to weight each OSD according to it’s size (3.64 for 4TB drive, 7.45 for 8TB drive, etc). As it turns out, it was not configured this way at all; all of the OSDs are weighted at 1. So my questions are: Can we re-weight the entire cluster to 3.64 and then re-weight the 8TB drives afterwards at a slow rate which won’t impact performance? If we do an entire re-weight will we have any issues? Would it be better to just reweight the 8TB drives to 2 gradually? Any and all suggestions are welcome. Cheers, Mike Cave ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com