This sounds a lot like: https://tracker.ceph.com/issues/51027 which is
fixed in https://github.com/ceph/ceph/pull/42690

David

On Tue, Sep 7, 2021 at 7:31 AM mabi <m...@protonmail.ch> wrote:
>
> Hello
>
> I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 nodes on 
> Ubuntu 20.04 LTS bare metal. I just upgraded each node's kernel and performed 
> a rolling reboot and now the ceph -s output is stuck somehow and the manager 
> service is only deployed to two nodes instead of 3 nodes. Here would be the 
> ceph -s output:
>
>   cluster:
>     id:     fb48d256-f43d-11eb-9f74-7fd39d4b232a
>     health: HEALTH_WARN
>             OSD count 1 < osd_pool_default_size 3
>
>   services:
>     mon: 2 daemons, quorum ceph1a,ceph1c (age 25m)
>     mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu
>     osd: 1 osds: 1 up (since 30m), 1 in (since 3w)
>
>   data:
>     pools:   0 pools, 0 pgs
>     objects: 0 objects, 0 B
>     usage:   5.3 MiB used, 7.0 TiB / 7.0 TiB avail
>     pgs:
>
>   progress:
>     Updating crash deployment (-1 -> 6) (0s)
>       [............................]
>
> Ignore the HEALTH_WARN with of the OSD count because I have not finished to 
> deploy all 3 OSDs. But you can see that the progress bar is stuck and I have 
> only 2 managers, the third manager does not seem to start as can be seen here:
>
> $ ceph orch ps|grep stopped
> mon.ceph1b            ceph1b               stopped           4m ago   4w      
>   -    2048M  <unknown>  <unknown>     <unknown>
>
> It looks like the orchestrator is stuck and does not continue it's job. Any 
> idea how I can get it unstuck?
>
> Best regards,
> Mabi
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to