[ceph-users] Re: ceph noout vs ceph norebalance, which is better for minor maintenance

William Konitzer Wed, 15 Feb 2023 15:29:35 -0800

Hi Dan,

I appreciate the quick response. In that case, would something like this be 
better, or is it overkill!?


1. ceph osd add-noout osd.x #mark out for recovery operations
2. ceph osd add-noin osd.x #prevent rebalancing onto the OSD
3. kubectl -n rook-ceph scale deployment rook-ceph-osd-<ID>-* --replicas=0 
#disable OSD
4. ceph osd down osd.x #prevent it from data placement and recovery operations
5. Upgrade the firmware on OSD
6. ceph osd up osd.x
7. kubectl -n rook-ceph scale deployment rook-ceph-osd-<ID>-* --replicas=1
8. ceph osd rm-noin osd.x
9. ceph osd rm-noout osd.x

Thanks,
Will


> On Feb 15, 2023, at 5:05 PM, Dan van der Ster <dvand...@gmail.com> wrote:
> 
> Sorry -- Let me rewrite that second paragraph without overloading the
> term "rebalancing", which I recognize is confusing.
> 
> ...
> 
> In your case, where you want to perform a quick firmware update on the
> drive, you should just use noout.
> 
> Without noout, the OSD will be marked out after 5 minutes and objects
> will be re-replicated to other OSDs -- those degraded PGs will move to
> "backfilling" state and copy the objects on new OSDs.
> 
> With noout, the cluster won't start backfilling/recovering, but don't
> worry -- this won't block IO. What happens is the disk that is having
> its firmware upgraded will be marked "down", and IO will be accepted
> and logged by its peers, so that when the disk is back "up" it can
> replay ("recover") those writes to catch up.
> 
> 
> The norebalance flag only impacts data movement for PGs that are not
> degraded -- no OSDs are down. This can be useful to pause backfilling
> e.g. when you are adding or removing hosts to a cluster.
> 
> -- dan
> 
> On Wed, Feb 15, 2023 at 2:58 PM Dan van der Ster <dvand...@gmail.com> wrote:
>> 
>> Hi Will,
>> 
>> There are some misconceptions in your mail.
>> 
>> 1. "noout" is a flag used to prevent the down -> out transition after
>> an osd is down for several minutes. (Default 5 minutes).
>> 2. "norebalance" is a flag used to prevent objects from being
>> backfilling to a different OSD *if the PG is not degraded*.
>> 
>> In your case, where you want to perform a quick firmware update on the
>> drive, you should just use noout.
>> Without noout, the OSD will be marked out after 5 minutes and data
>> will start rebalancing to other OSDs.
>> With noout, the cluster won't start rebalancing. But this won't block
>> IO -- the disk being repaired will be "down" and IO will be accepted
>> and logged by it's peers, so that when the disk is back "up" it can
>> replay those writes to catch up.
>> 
>> Hope that helps!
>> 
>> Dan
>> 
>> 
>> 
>> On Wed, Feb 15, 2023 at 1:12 PM <wkonit...@mirantis.com> wrote:
>>> 
>>> Hi,
>>> 
>>> We have a discussion going on about which is the correct flag to use for 
>>> some maintenance on an OSD, should it be "noout" or "norebalance"? This was 
>>> sparked because we need to take an OSD out of service for a short while to 
>>> upgrade the firmware.
>>> 
>>> One school of thought is:
>>> - "ceph norebalance" prevents automatic rebalancing of data between OSDs, 
>>> which Ceph does to ensure all OSDs have roughly the same amount of data.
>>> - "ceph noout" on the other hand prevents OSDs from being marked as 
>>> out-of-service during maintenance, which helps maintain cluster performance 
>>> and availability.
>>> - Additionally, if another OSD fails while the "norebalance" flag is set, 
>>> the data redundancy and fault tolerance of the Ceph cluster may be 
>>> compromised.
>>> - So if we're going to maintain the performance and reliability we need to 
>>> set the "ceph noout" flag to prevent the OSD from being marked as OOS 
>>> during maintenance and allow the automatic data redistribution feature of 
>>> Ceph to work as intended.
>>> 
>>> The other opinion is:
>>> - With the noout flag set, Ceph clients are forced to think that OSD exists 
>>> and is accessible - so they continue sending requests to such OSD. The OSD 
>>> also remains in the crush map without any signs that it is actually out. If 
>>> an additional OSD fails in the cluster with the noout flag set, Ceph is 
>>> forced to continue thinking that this new failed OSD is OK. It leads to 
>>> stalled or delayed response from the OSD side to clients.
>>> - Norebalance instead takes into account the in/out OSD status, but 
>>> prevents data rebalance. Clients are also aware of the real OSD status, so 
>>> no requests go to the OSD which is actually out. If an additional OSD fails 
>>> - only the required temporary PG are created to maintain at least 2 
>>> existing copies of the same data (well, generally it is set by the pool min 
>>> size).
>>> 
>>> The upstream docs seem pretty clear that noout should be used for 
>>> maintenance 
>>> (https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-osd/),
>>>  but the second opinion strongly suggests that norebalance is actually 
>>> better and the Ceph docs are out of date.
>>> 
>>> So what is the feedback from the wider community?
>>> 
>>> Thanks,
>>> Will
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph noout vs ceph norebalance, which is better for minor maintenance

Reply via email to