[ceph-users] Re: cephadm host maintenance

Steven Goodliff Thu, 14 Jul 2022 02:02:06 -0700


Thanks for the replies,

It feels to me that cephadm should handle this case as it offers the 
maintenance function. right now i have a simple version of a playbook that just 
does the noout / patch the OS and reboot and unset noout ( similar to 
https://github.com/ceph/ceph-ansible/blob/main/infrastructure-playbooks/untested-by-ci/cluster-maintenance.yml
 ) and a different version that attempts the host maintenance but fails on the 
instance that is running the mgr. If i get anywhere with detecting the instance 
is the active manager handling that in Ansible i will reply back here.

Cheers

Steven Goodliff

________________________________
From: Robert Gallop <robert.gal...@gmail.com>
Sent: 13 July 2022 16:55
To: Adam King
Cc: Steven Goodliff; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: cephadm host maintenance

This brings up a good follow on…. Rebooting in general for OS patching.

I have not been leveraging the maintenance mode function, as I found it was 
really no different than just setting noout and doing the reboot.  I find if 
the box is the active manager the failover happens quick, painless and 
automatically.  All the OSD’s just show as missing and come back once the box 
is back from reboot…

Am I causing issues I may not be aware of?  How is everyone handling patching 
reboots?

The only place I’m careful is the active MDS nodes, since that failover does 
cause a period of no i/o for the mounted clients, I generally fail that 
manually so I can ensure I don’t have to wait for the MDS to figure out an 
instance is gone and spin up a standby….

Any tips or techniques until there is a more holistic approach?

Thanks!

On Wed, Jul 13, 2022 at 9:49 AM Adam King 
<adk...@redhat.com<mailto:adk...@redhat.com>> wrote:
Hello Steven,

Arguably, it should, but right now nothing is implemented to do so and
you'd have to manually run the "ceph mgr fail
node2-cobj2-atdev1-nvan.ghxlvw" before it would allow you to put the host
in maintenance. It's non-trivial from a technical point of view to have it
automatically do the switch as the cephadm instance is running on that
active mgr, so it will have to store somewhere that we wanted this host in
maintenance, fail over the mgr itself, then have the new cephadm instance
pick up that we wanted the host in maintenance and do so. Possible, but not
something anyone has had a chance to implement. FWIW, I do believe there
are also plans to eventually have a playbook for a rolling reboot or
something of the sort added to https://github.com/ceph/cephadm-ansible. But
for now, I think some sort of intervention to cause the fail over to happen
before running the maintenance enter command is necessary.

Regards,
 - Adam King

On Wed, Jul 13, 2022 at 11:02 AM Steven Goodliff <
steven.goodl...@globalrelay.net<mailto:steven.goodl...@globalrelay.net>> wrote:

>
> Hi,
>
>
> I'm trying to reboot a ceph cluster one instance at a time by running in a
> Ansible playbook which basically runs
>
>
> cephadm shell ceph orch host maintenance enter <hostname>  and then
> reboots the instance and exits the maintenance
>
>
> but i get
>
>
> ALERT: Cannot stop active Mgr daemon, Please switch active Mgrs with 'ceph
> mgr fail node2-cobj2-atdev1-nvan.ghxlvw'
>
>
> on one instance.  should cephadm handle the switch ?
>
>
> thanks
>
> Steven Goodliff
> Global Relay
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> To unsubscribe send an email to 
> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
To unsubscribe send an email to 
ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm host maintenance

Reply via email to