Do I misunderstand this script, or does it not _quite_ do what’s desired here?
I fully get the scenario of applying a full-cluster map to allow incremental topology changes. To be clear, if this is run to effectively freeze backfill during / following a traumatic event, it will freeze that adapted state, not strictly return to the pre-event state? And thus the pg-upmap balancer would still need to be run to revert to the prior state? And this would also hold true for a failed/replaced OSD? > On May 1, 2020, at 7:37 AM, Dylan McCulloch <d...@unimelb.edu.au> wrote: > > Thanks Dan, that looks like a really neat method & script for a few > use-cases. We've actually used several of the scripts in that repo over the > years, so, many thanks for sharing. > > That method will definitely help in the scenario in which a set of > unnecessary pg remaps have been triggered and can be caught early and > reverted. I'm still a little concerned about the possibility of, for example, > a brief network glitch occurring at night and then waking up to a full > unbalanced cluster. Especially with NVMe clusters that can rapidly remap and > rebalance (and for which we also have a greater impetus to squeeze out as > much available capacity as possible with upmap due to cost per TB). It's just > a risk I hadn't previously considered and was wondering if others have either > run into it or felt any need to plan around it. > > Cheers, > Dylan > > >> From: Dan van der Ster <d...@vanderster.com> >> Sent: Friday, 1 May 2020 5:53 PM >> To: Dylan McCulloch <d...@unimelb.edu.au> >> Cc: ceph-users <ceph-users@ceph.io> >> >> Subject: Re: [ceph-users] upmap balancer and consequences of osds briefly >> marked out >> >> Hi, >> >> You're correct that all the relevant upmap entries are removed when an >> OSD is marked out. >> You can try to use this script which will recreate them and get the >> cluster back to HEALTH_OK quickly: >> https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py >> >> Cheers, Dan >> >> >> On Fri, May 1, 2020 at 9:36 AM Dylan McCulloch <d...@unimelb.edu.au> wrote: >>> >>> Hi all, >>> >>> We're using upmap balancer which has made a huge improvement in evenly >>> distributing data on our osds and has provided a substantial increase in >>> usable capacity. >>> >>> Currently on ceph version: 12.2.13 luminous >>> >>> We ran into a firewall issue recently which led to a large number of osds >>> being briefly marked 'down' & 'out'. The osds came back 'up' & 'in' after >>> about 25 mins and the cluster was fine but had to perform a significant >>> amount of backfilling/recovery despite >> there being no end-user client I/O during that period. >>> >>> Presumably the large number of remapped pgs and backfills were due to >>> pg_upmap_items being removed from the osdmap when osds were marked out and >>> subsequently those pgs were redistributed using the default crush algorithm. >>> As a result of the brief outage our cluster became significantly imbalanced >>> again with several osds very close to full. >>> Is there any reasonable mitigation for that scenario? >>> >>> The auto-balancer will not perform optimizations while there are degraded >>> pgs, so it would only start reapplying pg upmap exceptions after initial >>> recovery is complete (at which point capacity may be dangerously reduced). >>> Similarly, as admins, we normally only apply changes when the cluster is in >>> a healthy state, but if the same issue were to occur again would it be >>> advisable to manually apply balancer plans while initial recovery is still >>> taking place? >>> >>> I guess my concern from this experience is that making use of the capacity >>> gained by using upmap balancer appears to carry some risk. i.e. it's >>> possible for a brief outage to remove those space efficiencies relatively >>> quickly and potentially result in full >> osds/cluster before the automatic balancer is able to resume and >> redistribute pgs using upmap. >>> >>> Curious whether others have any thoughts or experience regarding this. >>> >>> Cheers, >>> Dylan >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io