Dear all,

How are you?

I have a cluster on Pacific with 3 hosts, each one with 1 mon,  1 mgr and 12 OSDs.

One of the hosts, darkside1, has been out of quorum according to ceph status.

Systemd showed 4 services dead, two mons and two mgrs.

I managed to systemctl restart one mon and one mgr, but even after several attempts, the remaining mon and mgr services, when asked to restart, keep returning to a failed state after a few seconds. They try to auto-restart and then go into a failed state where systemd requires me to manually set them to "reset-failed" before trying to start again. But they never stay up. There are no clear messages about the issue in /var/log/ceph/cephadm.log.

The host is still out of quorum.


I have failed to "turn on debug" as per https://docs.ceph.com/en/pacific/rados/troubleshooting/log-and-debug/. It seems I do not know the proper incantantion for "ceph daemon X config show", no string for X seems to satisfy this command. I have tried adding this:

[mon]

     debug mon = 20


To my ceph.conf, but no additional lines of log are sent to /var/log/cephadm.log


 so I'm sorry I can´t provide more details.


Could someone help me debug this situation? I am sure that if just reboot the machine, it will start up the services properly, as it always has done, but I would prefer to fix this without this action.


Cordially,

Renata.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to