Re: [ceph-users] CRUSH rebalance all at once or host-by-host?

2020-01-09 Thread Stefan Kooman
Quoting Sean Matheny (s.math...@auckland.ac.nz):
> I tested this out by setting norebalance and norecover, moving the host 
> buckets under the rack buckets (all of them), and then unsetting. Ceph starts 
> melting down with escalating slow requests, even with backfill and recovery 
> parameters set to throttle. I moved the host buckets back to the default root 
> bucket, and things mostly came right, but I still had some inactive / unknown 
> pgs that I had to restart some OSDs to get back to health_ok.
> 
> I’m sure there’s a way you can tune things or fade in crush weights or 
> something, but I’m happy just moving one at a time.

For big changes like this you can use Dan's UPMAP trick:
https://www.slideshare.net/Inktank_Ceph/ceph-day-berlin-mastering-ceph-operations-upmap-and-the-mgr-balancer

Python script:
https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py

This way you can pause the process or get in "HEALTH_OK" state when
you want to.

Gr. Stefan


-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CRUSH rebalance all at once or host-by-host?

2020-01-08 Thread Sean Matheny
I tested this out by setting norebalance and norecover, moving the host buckets 
under the rack buckets (all of them), and then unsetting. Ceph starts melting 
down with escalating slow requests, even with backfill and recovery parameters 
set to throttle. I moved the host buckets back to the default root bucket, and 
things mostly came right, but I still had some inactive / unknown pgs that I 
had to restart some OSDs to get back to health_ok.

I’m sure there’s a way you can tune things or fade in crush weights or 
something, but I’m happy just moving one at a time.

Our environment has 224 OSDs on 14 hosts, btw.

Cheers,
Sean M


On 8/01/2020, at 1:32 PM, Sean Matheny 
mailto:s.math...@auckland.ac.nz>> wrote:

We’re adding in a CRUSH hierarchy retrospectively in preparation for a big 
expansion. Previously we only had host and osd buckets, and now we’ve added in 
rack buckets.

I’ve got sensible settings to limit rebalancing set, at least what has worked 
in the past:
osd_max_backfills = 1
osd_recovery_threads = 1
osd_recovery_priority = 5
osd_client_op_priority = 63
osd_recovery_max_active = 3

I thought it would save a lot of unnecessary data movement if I move the 
existing host buckets to the new rack buckets all at once, rather than 
host-by-host. As long as recovery is throttled correctly, it shouldn’t matter 
how many objects are misplaced, the thinking goes.

1) Is doing all at once advisable, or am I putting myself at a much greater 
risk if I do have failures during the rebalance (which could take quite a 
while)?
2) My failure domain is currently set at the host level. If I want to change 
the failure domain to ‘rack’, when should I best change this (e.g. after the 
rebalancing finishes for moving the hosts to the racks)?

v12.2.2 if it makes a difference.

Cheers,
Sean M






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CRUSH rebalance all at once or host-by-host?

2020-01-07 Thread Sean Matheny
We’re adding in a CRUSH hierarchy retrospectively in preparation for a big 
expansion. Previously we only had host and osd buckets, and now we’ve added in 
rack buckets.

I’ve got sensible settings to limit rebalancing set, at least what has worked 
in the past:
osd_max_backfills = 1
osd_recovery_threads = 1
osd_recovery_priority = 5
osd_client_op_priority = 63
osd_recovery_max_active = 3

I thought it would save a lot of unnecessary data movement if I move the 
existing host buckets to the new rack buckets all at once, rather than 
host-by-host. As long as recovery is throttled correctly, it shouldn’t matter 
how many objects are misplaced, the thinking goes.

1) Is doing all at once advisable, or am I putting myself at a much greater 
risk if I do have failures during the rebalance (which could take quite a 
while)?
2) My failure domain is currently set at the host level. If I want to change 
the failure domain to ‘rack’, when should I best change this (e.g. after the 
rebalancing finishes for moving the hosts to the racks)?

v12.2.2 if it makes a difference.

Cheers,
Sean M





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com