[ceph-users] Re: upmap balancer and consequences of osds briefly marked out

Anthony D'Atri Sun, 03 May 2020 13:28:11 -0700

Do I misunderstand this script, or does it not _quite_ do what’s desired here?


I fully get the scenario of applying a full-cluster map to allow incremental 
topology changes.

To be clear, if this is run to effectively freeze backfill during / following a 
traumatic event, it will freeze that adapted state, not strictly return to the 
pre-event state?  And thus the pg-upmap balancer would still need to be run to 
revert to the prior state?  And this would also hold true for a failed/replaced 
OSD?


> On May 1, 2020, at 7:37 AM, Dylan McCulloch <d...@unimelb.edu.au> wrote:
> 
> Thanks Dan, that looks like a really neat method & script for a few 
> use-cases. We've actually used several of the scripts in that repo over the 
> years, so, many thanks for sharing.
> 
> That method will definitely help in the scenario in which a set of 
> unnecessary pg remaps have been triggered and can be caught early and 
> reverted. I'm still a little concerned about the possibility of, for example, 
> a brief network glitch occurring at night and then waking up to a full 
> unbalanced cluster. Especially with NVMe clusters that can rapidly remap and 
> rebalance (and for which we also have a greater impetus to squeeze out as 
> much available capacity as possible with upmap due to cost per TB). It's just 
> a risk I hadn't previously considered and was wondering if others have either 
> run into it or felt any need to plan around it.
> 
> Cheers,
> Dylan
> 
> 
>> From: Dan van der Ster <d...@vanderster.com>
>> Sent: Friday, 1 May 2020 5:53 PM
>> To: Dylan McCulloch <d...@unimelb.edu.au>
>> Cc: ceph-users <ceph-users@ceph.io>
>> 
>> Subject: Re: [ceph-users] upmap balancer and consequences of osds briefly 
>> marked out
>> 
>> Hi,
>> 
>> You're correct that all the relevant upmap entries are removed when an
>> OSD is marked out.
>> You can try to use this script which will recreate them and get the
>> cluster back to HEALTH_OK quickly:
>> https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
>> 
>> Cheers, Dan
>> 
>> 
>> On Fri, May 1, 2020 at 9:36 AM Dylan McCulloch <d...@unimelb.edu.au> wrote:
>>> 
>>> Hi all,
>>> 
>>> We're using upmap balancer which has made a huge improvement in evenly 
>>> distributing data on our osds and has provided a substantial increase in 
>>> usable capacity.
>>> 
>>> Currently on ceph version: 12.2.13 luminous
>>> 
>>> We ran into a firewall issue recently which led to a large number of osds 
>>> being briefly marked 'down' & 'out'. The osds came back 'up' & 'in' after 
>>> about 25 mins and the cluster was fine but had to perform a significant 
>>> amount of backfilling/recovery despite
>> there being no end-user client I/O during that period.
>>> 
>>> Presumably the large number of remapped pgs and backfills were due to 
>>> pg_upmap_items being removed from the osdmap when osds were marked out and 
>>> subsequently those pgs were redistributed using the default crush algorithm.
>>> As a result of the brief outage our cluster became significantly imbalanced 
>>> again with several osds very close to full.
>>> Is there any reasonable mitigation for that scenario?
>>> 
>>> The auto-balancer will not perform optimizations while there are degraded 
>>> pgs, so it would only start reapplying pg upmap exceptions after initial 
>>> recovery is complete (at which point capacity may be dangerously reduced).
>>> Similarly, as admins, we normally only apply changes when the cluster is in 
>>> a healthy state, but if the same issue were to occur again would it be 
>>> advisable to manually apply balancer plans while initial recovery is still 
>>> taking place?
>>> 
>>> I guess my concern from this experience is that making use of the capacity 
>>> gained by using upmap balancer appears to carry some risk. i.e. it's 
>>> possible for a brief outage to remove those space efficiencies relatively 
>>> quickly and potentially result in full
>> osds/cluster before the automatic balancer is able to resume and 
>> redistribute pgs using upmap.
>>> 
>>> Curious whether others have any thoughts or experience regarding this.
>>> 
>>> Cheers,
>>> Dylan
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: upmap balancer and consequences of osds briefly marked out

Reply via email to