Hi Dan, I appreciate the quick response. In that case, would something like this be better, or is it overkill!?
1. ceph osd add-noout osd.x #mark out for recovery operations 2. ceph osd add-noin osd.x #prevent rebalancing onto the OSD 3. kubectl -n rook-ceph scale deployment rook-ceph-osd-<ID>-* --replicas=0 #disable OSD 4. ceph osd down osd.x #prevent it from data placement and recovery operations 5. Upgrade the firmware on OSD 6. ceph osd up osd.x 7. kubectl -n rook-ceph scale deployment rook-ceph-osd-<ID>-* --replicas=1 8. ceph osd rm-noin osd.x 9. ceph osd rm-noout osd.x Thanks, Will > On Feb 15, 2023, at 5:05 PM, Dan van der Ster <dvand...@gmail.com> wrote: > > Sorry -- Let me rewrite that second paragraph without overloading the > term "rebalancing", which I recognize is confusing. > > ... > > In your case, where you want to perform a quick firmware update on the > drive, you should just use noout. > > Without noout, the OSD will be marked out after 5 minutes and objects > will be re-replicated to other OSDs -- those degraded PGs will move to > "backfilling" state and copy the objects on new OSDs. > > With noout, the cluster won't start backfilling/recovering, but don't > worry -- this won't block IO. What happens is the disk that is having > its firmware upgraded will be marked "down", and IO will be accepted > and logged by its peers, so that when the disk is back "up" it can > replay ("recover") those writes to catch up. > > > The norebalance flag only impacts data movement for PGs that are not > degraded -- no OSDs are down. This can be useful to pause backfilling > e.g. when you are adding or removing hosts to a cluster. > > -- dan > > On Wed, Feb 15, 2023 at 2:58 PM Dan van der Ster <dvand...@gmail.com> wrote: >> >> Hi Will, >> >> There are some misconceptions in your mail. >> >> 1. "noout" is a flag used to prevent the down -> out transition after >> an osd is down for several minutes. (Default 5 minutes). >> 2. "norebalance" is a flag used to prevent objects from being >> backfilling to a different OSD *if the PG is not degraded*. >> >> In your case, where you want to perform a quick firmware update on the >> drive, you should just use noout. >> Without noout, the OSD will be marked out after 5 minutes and data >> will start rebalancing to other OSDs. >> With noout, the cluster won't start rebalancing. But this won't block >> IO -- the disk being repaired will be "down" and IO will be accepted >> and logged by it's peers, so that when the disk is back "up" it can >> replay those writes to catch up. >> >> Hope that helps! >> >> Dan >> >> >> >> On Wed, Feb 15, 2023 at 1:12 PM <wkonit...@mirantis.com> wrote: >>> >>> Hi, >>> >>> We have a discussion going on about which is the correct flag to use for >>> some maintenance on an OSD, should it be "noout" or "norebalance"? This was >>> sparked because we need to take an OSD out of service for a short while to >>> upgrade the firmware. >>> >>> One school of thought is: >>> - "ceph norebalance" prevents automatic rebalancing of data between OSDs, >>> which Ceph does to ensure all OSDs have roughly the same amount of data. >>> - "ceph noout" on the other hand prevents OSDs from being marked as >>> out-of-service during maintenance, which helps maintain cluster performance >>> and availability. >>> - Additionally, if another OSD fails while the "norebalance" flag is set, >>> the data redundancy and fault tolerance of the Ceph cluster may be >>> compromised. >>> - So if we're going to maintain the performance and reliability we need to >>> set the "ceph noout" flag to prevent the OSD from being marked as OOS >>> during maintenance and allow the automatic data redistribution feature of >>> Ceph to work as intended. >>> >>> The other opinion is: >>> - With the noout flag set, Ceph clients are forced to think that OSD exists >>> and is accessible - so they continue sending requests to such OSD. The OSD >>> also remains in the crush map without any signs that it is actually out. If >>> an additional OSD fails in the cluster with the noout flag set, Ceph is >>> forced to continue thinking that this new failed OSD is OK. It leads to >>> stalled or delayed response from the OSD side to clients. >>> - Norebalance instead takes into account the in/out OSD status, but >>> prevents data rebalance. Clients are also aware of the real OSD status, so >>> no requests go to the OSD which is actually out. If an additional OSD fails >>> - only the required temporary PG are created to maintain at least 2 >>> existing copies of the same data (well, generally it is set by the pool min >>> size). >>> >>> The upstream docs seem pretty clear that noout should be used for >>> maintenance >>> (https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-osd/), >>> but the second opinion strongly suggests that norebalance is actually >>> better and the Ceph docs are out of date. >>> >>> So what is the feedback from the wider community? >>> >>> Thanks, >>> Will >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io