Hi Adam, it's ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)
19. 10. 2021 18:19:29 Adam King <adk...@redhat.com>: > Hi Denis, > > Which ceph version is your cluster running on? I know there was an issue with > mons getting dropped from the monmap (and therefore being stuck out of > quorum) when their host was rebooted in Pacific version prior to 16.2.6 > https://tracker.ceph.com/issues/51027. If you're on a Pacific version older > than 16.2.6 it could be that same issue and workarounds are discussed in the > tracker. Even if you are on 16.2.6 the workarounds in that tracker could > still be helpful. > > On Tue, Oct 19, 2021 at 12:07 PM Denis Polom <denispo...@gmail.com> wrote: >> Hi, >> >> one of our monitor VM was rebooted and not joining quorum again (quorum >> consist out of 3 monitors). While monitor service (ceph1) is running on >> this VM, Ceph cluster become unreachable. In monitor logs on ceph3 VM I >> can see a lot of following messages: >> >> >> 2021-10-19 17:50:19.555 7fe49e912700 0 log_channel(audit) log [DBG] : >> from='client.? 10.13.68.11:0/1846917599[http://10.13.68.11:0/1846917599]' >> entity='client.admin' >> cmd=[{"prefix": "osd blacklist ls"}]: dispatch >> 2021-10-19 17:50:20.255 7fe4a1117700 1 mon.ceph3@1(leader).paxos(paxos >> updating c 95374479..95375018) accept timeout, calling fresh election >> 2021-10-19 17:50:20.255 7fe49e912700 0 log_channel(cluster) log [INF] : >> mon.ceph3 calling monitor election >> 2021-10-19 17:50:20.255 7fe49e912700 1 >> mon.ceph3@1(electing).elector(42748) init, last seen epoch 42748 >> 2021-10-19 17:50:20.263 7fe49e912700 -1 mon.ceph3@1(electing) e4 failed >> to get devid for : fallback method has serial ''but no model >> 2021-10-19 17:50:21.491 7fe49b90c700 1 mon.ceph3@1(electing) e4 >> handle_auth_request failed to assign global_id >> 2021-10-19 17:50:23.567 7fe49b90c700 1 mon.ceph3@1(electing) e4 >> handle_auth_request failed to assign global_id >> 2021-10-19 17:50:23.771 7fe49b90c700 1 mon.ceph3@1(electing) e4 >> handle_auth_request failed to assign global_id >> 2021-10-19 17:50:24.175 7fe49c90e700 1 mon.ceph3@1(electing) e4 >> handle_auth_request failed to assign global_id >> 2021-10-19 17:50:24.979 7fe49c90e700 1 mon.ceph3@1(electing) e4 >> handle_auth_request failed to assign global_id >> 2021-10-19 17:50:25.223 7fe49c90e700 1 mon.ceph3@1(electing) e4 >> handle_auth_request failed to assign global_id >> 2021-10-19 17:50:25.263 7fe4a1117700 1 >> mon.ceph3@1(electing).elector(42749) init, last seen epoch 42749, >> mid-election, bumping >> 2021-10-19 17:50:25.271 7fe49c90e700 1 mon.ceph3@1(electing) e4 >> handle_auth_request failed to assign global_id >> 2021-10-19 17:50:25.279 7fe4a1117700 -1 mon.ceph3@1(electing) e4 failed >> to get devid for : fallback method has serial ''but no model >> 2021-10-19 17:50:25.487 7fe49c90e700 1 mon.ceph3@1(electing) e4 >> handle_auth_request failed to assign global_id >> >> >> NTP is running on all nodes on cluster and time is in correct sync. >> >> Any help would be appreciated. >> >> thx! >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io