Hi Adam,

it's
ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)

19. 10. 2021 18:19:29 Adam King <adk...@redhat.com>:

> Hi Denis,
> 
> Which ceph version is your cluster running on? I know there was an issue with 
> mons getting dropped from the monmap (and therefore being stuck out of 
> quorum) when their host was rebooted in Pacific version prior to 16.2.6 
> https://tracker.ceph.com/issues/51027. If you're on a Pacific version older 
> than 16.2.6 it could be that same issue and workarounds are discussed in the 
> tracker. Even if you are on 16.2.6 the workarounds in that tracker could 
> still be helpful.
> 
> On Tue, Oct 19, 2021 at 12:07 PM Denis Polom <denispo...@gmail.com> wrote:
>> Hi,
>> 
>> one of our monitor VM  was rebooted and not joining quorum again (quorum
>> consist out of 3 monitors). While monitor service (ceph1) is running on
>> this VM, Ceph cluster become unreachable. In monitor logs on ceph3 VM  I
>> can see a lot of  following messages:
>> 
>> 
>> 2021-10-19 17:50:19.555 7fe49e912700  0 log_channel(audit) log [DBG] :
>> from='client.? 10.13.68.11:0/1846917599[http://10.13.68.11:0/1846917599]' 
>> entity='client.admin'
>> cmd=[{"prefix": "osd blacklist ls"}]: dispatch
>> 2021-10-19 17:50:20.255 7fe4a1117700  1 mon.ceph3@1(leader).paxos(paxos
>> updating c 95374479..95375018) accept timeout, calling fresh election
>> 2021-10-19 17:50:20.255 7fe49e912700  0 log_channel(cluster) log [INF] :
>> mon.ceph3 calling monitor election
>> 2021-10-19 17:50:20.255 7fe49e912700  1
>> mon.ceph3@1(electing).elector(42748) init, last seen epoch 42748
>> 2021-10-19 17:50:20.263 7fe49e912700 -1 mon.ceph3@1(electing) e4 failed
>> to get devid for : fallback method has serial ''but no model
>> 2021-10-19 17:50:21.491 7fe49b90c700  1 mon.ceph3@1(electing) e4
>> handle_auth_request failed to assign global_id
>> 2021-10-19 17:50:23.567 7fe49b90c700  1 mon.ceph3@1(electing) e4
>> handle_auth_request failed to assign global_id
>> 2021-10-19 17:50:23.771 7fe49b90c700  1 mon.ceph3@1(electing) e4
>> handle_auth_request failed to assign global_id
>> 2021-10-19 17:50:24.175 7fe49c90e700  1 mon.ceph3@1(electing) e4
>> handle_auth_request failed to assign global_id
>> 2021-10-19 17:50:24.979 7fe49c90e700  1 mon.ceph3@1(electing) e4
>> handle_auth_request failed to assign global_id
>> 2021-10-19 17:50:25.223 7fe49c90e700  1 mon.ceph3@1(electing) e4
>> handle_auth_request failed to assign global_id
>> 2021-10-19 17:50:25.263 7fe4a1117700  1
>> mon.ceph3@1(electing).elector(42749) init, last seen epoch 42749,
>> mid-election, bumping
>> 2021-10-19 17:50:25.271 7fe49c90e700  1 mon.ceph3@1(electing) e4
>> handle_auth_request failed to assign global_id
>> 2021-10-19 17:50:25.279 7fe4a1117700 -1 mon.ceph3@1(electing) e4 failed
>> to get devid for : fallback method has serial ''but no model
>> 2021-10-19 17:50:25.487 7fe49c90e700  1 mon.ceph3@1(electing) e4
>> handle_auth_request failed to assign global_id
>> 
>> 
>> NTP is running on all nodes on cluster and time is in correct sync.
>> 
>> Any help would be appreciated.
>> 
>> thx!
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to