[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying
This sounds a lot like: https://tracker.ceph.com/issues/51027 which is fixed in https://github.com/ceph/ceph/pull/42690 David On Tue, Sep 7, 2021 at 7:31 AM mabi wrote: > > Hello > > I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 nodes on > Ubuntu 20.04 LTS bare metal. I just upgraded each node's kernel and performed > a rolling reboot and now the ceph -s output is stuck somehow and the manager > service is only deployed to two nodes instead of 3 nodes. Here would be the > ceph -s output: > > cluster: > id: fb48d256-f43d-11eb-9f74-7fd39d4b232a > health: HEALTH_WARN > OSD count 1 < osd_pool_default_size 3 > > services: > mon: 2 daemons, quorum ceph1a,ceph1c (age 25m) > mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu > osd: 1 osds: 1 up (since 30m), 1 in (since 3w) > > data: > pools: 0 pools, 0 pgs > objects: 0 objects, 0 B > usage: 5.3 MiB used, 7.0 TiB / 7.0 TiB avail > pgs: > > progress: > Updating crash deployment (-1 -> 6) (0s) > [] > > Ignore the HEALTH_WARN with of the OSD count because I have not finished to > deploy all 3 OSDs. But you can see that the progress bar is stuck and I have > only 2 managers, the third manager does not seem to start as can be seen here: > > $ ceph orch ps|grep stopped > mon.ceph1bceph1b stopped 4m ago 4w > -2048M > > It looks like the orchestrator is stuck and does not continue it's job. Any > idea how I can get it unstuck? > > Best regards, > Mabi > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying
I forgot to mention, the progress not updating is a seperate bug, you can fail the mgr (ceph mgr fail ceph1a.guidwn in your example) to resolve that. On the monitor side, I assume you deployed using labels? If so - just remove the label from the host where the monitor did not start, let it fully undeploy, then re-add the label, and it will redeploy. On Wed, Sep 8, 2021 at 7:03 AM David Orman wrote: > > This sounds a lot like: https://tracker.ceph.com/issues/51027 which is > fixed in https://github.com/ceph/ceph/pull/42690 > > David > > On Tue, Sep 7, 2021 at 7:31 AM mabi wrote: > > > > Hello > > > > I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 nodes on > > Ubuntu 20.04 LTS bare metal. I just upgraded each node's kernel and > > performed a rolling reboot and now the ceph -s output is stuck somehow and > > the manager service is only deployed to two nodes instead of 3 nodes. Here > > would be the ceph -s output: > > > > cluster: > > id: fb48d256-f43d-11eb-9f74-7fd39d4b232a > > health: HEALTH_WARN > > OSD count 1 < osd_pool_default_size 3 > > > > services: > > mon: 2 daemons, quorum ceph1a,ceph1c (age 25m) > > mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu > > osd: 1 osds: 1 up (since 30m), 1 in (since 3w) > > > > data: > > pools: 0 pools, 0 pgs > > objects: 0 objects, 0 B > > usage: 5.3 MiB used, 7.0 TiB / 7.0 TiB avail > > pgs: > > > > progress: > > Updating crash deployment (-1 -> 6) (0s) > > [] > > > > Ignore the HEALTH_WARN with of the OSD count because I have not finished to > > deploy all 3 OSDs. But you can see that the progress bar is stuck and I > > have only 2 managers, the third manager does not seem to start as can be > > seen here: > > > > $ ceph orch ps|grep stopped > > mon.ceph1bceph1b stopped 4m ago 4w > > -2048M > > > > It looks like the orchestrator is stuck and does not continue it's job. Any > > idea how I can get it unstuck? > > > > Best regards, > > Mabi > > > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying
Hello, A few days later the ceph status progress bar is still stuck and the third mon is for some unknown reason still not deploying itself as can be seen from the "ceph orch ls" output below: ceph orch ls NAME PORTSRUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,9094 1/1 3m ago 5w count:1 crash 7/7 3m ago 5w * grafana?:3000 1/1 3m ago 5w count:1 mgr 2/2 3m ago 4w count:2;label:mgr mon 2/3 3m ago 16h count:3;label:mon node-exporter ?:9100 7/7 3m ago 5w * osd 1/1 3m ago - prometheus ?:9095 1/1 3m ago 5w count:1 Is this a bug in cephadm? and is there a workaround? Thanks for any hints. ‐‐‐ Original Message ‐‐‐ On Tuesday, September 7th, 2021 at 2:30 PM, mabi wrote: > Hello > > I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 nodes on > Ubuntu 20.04 LTS bare metal. I just upgraded each node's kernel and performed > a rolling reboot and now the ceph -s output is stuck somehow and the manager > service is only deployed to two nodes instead of 3 nodes. Here would be the > ceph -s output: > > cluster: > > id: fb48d256-f43d-11eb-9f74-7fd39d4b232a > > health: HEALTH_WARN > > OSD count 1 < osd_pool_default_size 3 > > services: > > mon: 2 daemons, quorum ceph1a,ceph1c (age 25m) > > mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu > > osd: 1 osds: 1 up (since 30m), 1 in (since 3w) > > data: > > pools: 0 pools, 0 pgs > > objects: 0 objects, 0 B > > usage: 5.3 MiB used, 7.0 TiB / 7.0 TiB avail > > pgs: > > progress: > > Updating crash deployment (-1 -> 6) (0s) > > [] > > > Ignore the HEALTH_WARN with of the OSD count because I have not finished to > deploy all 3 OSDs. But you can see that the progress bar is stuck and I have > only 2 managers, the third manager does not seem to start as can be seen here: > > $ ceph orch ps|grep stopped > > mon.ceph1b ceph1b stopped 4m ago 4w - 2048M > > It looks like the orchestrator is stuck and does not continue it's job. Any > idea how I can get it unstuck? > > Best regards, > > Mabi ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying
You must have missed the response to your thread, I suppose: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YA5KLI5MFJRKVQBKUBG7PJG4RFYLBZFA/ Zitat von mabi : Hello, A few days later the ceph status progress bar is still stuck and the third mon is for some unknown reason still not deploying itself as can be seen from the "ceph orch ls" output below: ceph orch ls NAME PORTSRUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,9094 1/1 3m ago 5w count:1 crash 7/7 3m ago 5w * grafana?:3000 1/1 3m ago 5w count:1 mgr 2/2 3m ago 4w count:2;label:mgr mon 2/3 3m ago 16h count:3;label:mon node-exporter ?:9100 7/7 3m ago 5w * osd 1/1 3m ago - prometheus ?:9095 1/1 3m ago 5w count:1 Is this a bug in cephadm? and is there a workaround? Thanks for any hints. ‐‐‐ Original Message ‐‐‐ On Tuesday, September 7th, 2021 at 2:30 PM, mabi wrote: Hello I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 nodes on Ubuntu 20.04 LTS bare metal. I just upgraded each node's kernel and performed a rolling reboot and now the ceph -s output is stuck somehow and the manager service is only deployed to two nodes instead of 3 nodes. Here would be the ceph -s output: cluster: id: fb48d256-f43d-11eb-9f74-7fd39d4b232a health: HEALTH_WARN OSD count 1 < osd_pool_default_size 3 services: mon: 2 daemons, quorum ceph1a,ceph1c (age 25m) mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu osd: 1 osds: 1 up (since 30m), 1 in (since 3w) data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 5.3 MiB used, 7.0 TiB / 7.0 TiB avail pgs: progress: Updating crash deployment (-1 -> 6) (0s) [] Ignore the HEALTH_WARN with of the OSD count because I have not finished to deploy all 3 OSDs. But you can see that the progress bar is stuck and I have only 2 managers, the third manager does not seem to start as can be seen here: $ ceph orch ps|grep stopped mon.ceph1b ceph1b stopped 4m ago 4w - 2048M It looks like the orchestrator is stuck and does not continue it's job. Any idea how I can get it unstuck? Best regards, Mabi ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying
No problem, and it looks like they will. Glad it worked out for you! David On Thu, Sep 9, 2021 at 9:31 AM mabi wrote: > > Thank you Eugen. Indeed the answer went to Spam :( > > So thanks to David for his workaround, it worked like a charm. Hopefully > these patches can make it into the next pacific release. > > ‐‐‐ Original Message ‐‐‐ > > On Thursday, September 9th, 2021 at 2:33 PM, Eugen Block > wrote: > > > You must have missed the response to your thread, I suppose: > > > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YA5KLI5MFJRKVQBKUBG7PJG4RFYLBZFA/ > > > > Zitat von mabi m...@protonmail.ch: > > > > > Hello, > > > > > > A few days later the ceph status progress bar is still stuck and the > > > > > > third mon is for some unknown reason still not deploying itself as > > > > > > can be seen from the "ceph orch ls" output below: > > > > > > ceph orch ls > > > > > > NAME PORTS RUNNING REFRESHED AGE PLACEMENT > > > > > > alertmanager ?:9093,9094 1/1 3m ago 5w count:1 > > > > > > crash 7/7 3m ago 5w * > > > > > > grafana ?:3000 1/1 3m ago 5w count:1 > > > > > > mgr 2/2 3m ago 4w count:2;label:mgr > > > > > > mon 2/3 3m ago 16h count:3;label:mon > > > > > > node-exporter ?:9100 7/7 3m ago 5w * > > > > > > osd 1/1 3m ago - > > > > > > prometheus ?:9095 1/1 3m ago 5w count:1 > > > > > > Is this a bug in cephadm? and is there a workaround? > > > > > > Thanks for any hints. > > > > > > ‐‐‐ Original Message ‐‐‐ > > > > > > On Tuesday, September 7th, 2021 at 2:30 PM, mabi m...@protonmail.ch wrote: > > > > > > > Hello > > > > > > > > I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 > > > > > > > > nodes on Ubuntu 20.04 LTS bare metal. I just upgraded each node's > > > > > > > > kernel and performed a rolling reboot and now the ceph -s output is > > > > > > > > stuck somehow and the manager service is only deployed to two nodes > > > > > > > > instead of 3 nodes. Here would be the ceph -s output: > > > > > > > > cluster: > > > > > > > > id: fb48d256-f43d-11eb-9f74-7fd39d4b232a > > > > > > > > health: HEALTH_WARN > > > > > > > > OSD count 1 < osd_pool_default_size 3 > > > > > > > > services: > > > > > > > > mon: 2 daemons, quorum ceph1a,ceph1c (age 25m) > > > > > > > > mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu > > > > > > > > osd: 1 osds: 1 up (since 30m), 1 in (since 3w) > > > > > > > > data: > > > > > > > > pools: 0 pools, 0 pgs > > > > > > > > objects: 0 objects, 0 B > > > > > > > > usage: 5.3 MiB used, 7.0 TiB / 7.0 TiB avail > > > > > > > > pgs: > > > > > > > > progress: > > > > > > > > Updating crash deployment (-1 -> 6) (0s) > > > > > > > > [] > > > > > > > > > > > > Ignore the HEALTH_WARN with of the OSD count because I have not > > > > > > > > finished to deploy all 3 OSDs. But you can see that the progress > > > > > > > > bar is stuck and I have only 2 managers, the third manager does not > > > > > > > > seem to start as can be seen here: > > > > > > > > $ ceph orch ps|grep stopped > > > > > > > > mon.ceph1b ceph1b stopped 4m ago 4w - 2048M > > > > > > > > > > > > It looks like the orchestrator is stuck and does not continue it's > > > > > > > > job. Any idea how I can get it unstuck? > > > > > > > > Best regards, > > > > > > > > Mabi > > > > > > ceph-users mailing list -- ceph-users@ceph.io > > > > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > ceph-users mailing list -- ceph-users@ceph.io > > > > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph progress bar stuck and 3rd manager not deploying
Thank you Eugen. Indeed the answer went to Spam :( So thanks to David for his workaround, it worked like a charm. Hopefully these patches can make it into the next pacific release. ‐‐‐ Original Message ‐‐‐ On Thursday, September 9th, 2021 at 2:33 PM, Eugen Block wrote: > You must have missed the response to your thread, I suppose: > > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YA5KLI5MFJRKVQBKUBG7PJG4RFYLBZFA/ > > Zitat von mabi m...@protonmail.ch: > > > Hello, > > > > A few days later the ceph status progress bar is still stuck and the > > > > third mon is for some unknown reason still not deploying itself as > > > > can be seen from the "ceph orch ls" output below: > > > > ceph orch ls > > > > NAME PORTS RUNNING REFRESHED AGE PLACEMENT > > > > alertmanager ?:9093,9094 1/1 3m ago 5w count:1 > > > > crash 7/7 3m ago 5w * > > > > grafana ?:3000 1/1 3m ago 5w count:1 > > > > mgr 2/2 3m ago 4w count:2;label:mgr > > > > mon 2/3 3m ago 16h count:3;label:mon > > > > node-exporter ?:9100 7/7 3m ago 5w * > > > > osd 1/1 3m ago - > > > > prometheus ?:9095 1/1 3m ago 5w count:1 > > > > Is this a bug in cephadm? and is there a workaround? > > > > Thanks for any hints. > > > > ‐‐‐ Original Message ‐‐‐ > > > > On Tuesday, September 7th, 2021 at 2:30 PM, mabi m...@protonmail.ch wrote: > > > > > Hello > > > > > > I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 > > > > > > nodes on Ubuntu 20.04 LTS bare metal. I just upgraded each node's > > > > > > kernel and performed a rolling reboot and now the ceph -s output is > > > > > > stuck somehow and the manager service is only deployed to two nodes > > > > > > instead of 3 nodes. Here would be the ceph -s output: > > > > > > cluster: > > > > > > id: fb48d256-f43d-11eb-9f74-7fd39d4b232a > > > > > > health: HEALTH_WARN > > > > > > OSD count 1 < osd_pool_default_size 3 > > > > > > services: > > > > > > mon: 2 daemons, quorum ceph1a,ceph1c (age 25m) > > > > > > mgr: ceph1a.guidwn(active, since 25m), standbys: ceph1c.bttxuu > > > > > > osd: 1 osds: 1 up (since 30m), 1 in (since 3w) > > > > > > data: > > > > > > pools: 0 pools, 0 pgs > > > > > > objects: 0 objects, 0 B > > > > > > usage: 5.3 MiB used, 7.0 TiB / 7.0 TiB avail > > > > > > pgs: > > > > > > progress: > > > > > > Updating crash deployment (-1 -> 6) (0s) > > > > > > [] > > > > > > > > > Ignore the HEALTH_WARN with of the OSD count because I have not > > > > > > finished to deploy all 3 OSDs. But you can see that the progress > > > > > > bar is stuck and I have only 2 managers, the third manager does not > > > > > > seem to start as can be seen here: > > > > > > $ ceph orch ps|grep stopped > > > > > > mon.ceph1b ceph1b stopped 4m ago 4w - 2048M > > > > > > It looks like the orchestrator is stuck and does not continue it's > > > > > > job. Any idea how I can get it unstuck? > > > > > > Best regards, > > > > > > Mabi > > > > ceph-users mailing list -- ceph-users@ceph.io > > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io