Hi, In our experience failovers are largely transparent if the mds has:
mds session blacklist on timeout = false mds session blacklist on evict = false And clients have client reconnect stale = true Cheers, Dan On Wed, Jan 27, 2021 at 9:09 AM Martin Hronek <martin.hro...@rise-world.com> wrote: > > Hello fellow CEPH-users, > currently we are updating our CEPH(14.2.16) and making changes to some > config settings. > > TLDR: is there a way to make a graceful MDS active node shutdown without > loosing the caps, open files and client connections? Something like > handover active state, promote standby to active, ...? > > > Sadly we run into some difficulties when restarting MDS Nodes. While we > had two active nodes and one standby we initially though that this would > have a nice handover when restarting the active rank ... sadly we saw > how the node was going through the states: > replay-reconnect-rejoin-active as nicely visualized here > https://docs.ceph.com/en/latest/cephfs/mds-states/ > > This left some nodes going into timeouts until the standby node has gone > into the active state again, most probably since the cephfs hast already > some 600k folders and 3M files and from the client side it took more > than 30s. > > So before the next MDS the FS config where changed to one active and one > standby-replay node, the idea was that since the MDS replay nodes > follows the active one the handover would be smoother. The active state > was reached faster, but we still noticed some hiccups on the clients > while the new active MDS was waiting for clients to reconnect(state > up:reconnect) after the failover. > > The next idea was to do a manual node promotion, graceful shutdown or > something similar - where the open caps and sessions would be handed > over ... but I did not find any hint in the docs regarding this > functionality. > But, this should somehow be possible (imho), since when adding a second > active mds node (max_mds 2) and then removing it again (max_mds 1) the > rank 1 node goes to stopping-state and hands over all clients/caps to > rank 0 without interruptions for the clients. > > Therefore my question: how can one gracefully shutdown an active rank 0 > mds node or promote an standby node to the active state without loosing > open files/caps or client sessions? > > Thanks in advance, > M > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io