Before I started the upgrade the cluster was healthy but one OSD(osd.355) was down, can't remember if it was in or out.
Upgrade was started with
ceph orch upgrade start --image goharbor.example.com/library/ceph/ceph:v15.2.9

The upgrade started but when Ceph tried to upgrade osd.355 it paused with the following messages:

2021-03-11T09:15:35.638104+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade: Target is goharbor.example.com/library/ceph/ceph:v15.2.9 with id dfc48307963697ff48acd9dd6fda4a7a24017b9d8124f86c2
a542b0802fe77ba
2021-03-11T09:15:35.639882+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade: Checking mgr daemons... 2021-03-11T09:15:35.644170+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade: All mgr daemons are up to date. 2021-03-11T09:15:35.644376+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade: Checking mon daemons... 2021-03-11T09:15:35.647669+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade: All mon daemons are up to date. 2021-03-11T09:15:35.647866+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade: Checking crash daemons... 2021-03-11T09:15:35.652035+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade: Setting container_image for all crash... 2021-03-11T09:15:35.653683+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade: All crash daemons are up to date. 2021-03-11T09:15:35.653896+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade: Checking osd daemons... 2021-03-11T09:15:36.273345+0000 mgr.pech-mon-2.cjeiyc [INF] It is presumed safe to stop ['osd.355'] 2021-03-11T09:15:36.273504+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade: It is presumed safe to stop ['osd.355'] 2021-03-11T09:15:36.273887+0000 mgr.pech-mon-2.cjeiyc [INF] Upgrade: Redeploying osd.355 2021-03-11T09:15:36.276673+0000 mgr.pech-mon-2.cjeiyc [ERR] Upgrade: Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.355 on host pech-hd-009 failed.


One of the first ting the upgrade did was to upgrade mon, so they are restarted and now the osd.355 no longer exist

    $ ceph osd info osd.355
    Error EINVAL: osd.355 does not exist

But if I run a resume
    ceph orch upgrade resume
it still tries to upgrade osd.355, same message as above.

I tried to stop and start the upgrade again with
    ceph orch upgrade stop
ceph orch upgrade start --image goharbor.example.com/library/ceph/ceph:v15.2.9
it still tries to upgrade osd.355, with the same message as above.

Looking at the source code it looks like it get daemons to upgrade from mgr cache, so I restarted both mgr but still it tries to upgrade osd.355.


Does anyone know how I can get the upgrade to continue?

--
Kai Stian Olstad
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to