[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Sebastian Wagner
yes

Am 11.03.21 um 15:46 schrieb Kai Stian Olstad:
> Hi Sebastian
> 
> On 11.03.2021 13:13, Sebastian Wagner wrote:
>> looks like
>>
>> $ ssh pech-hd-009
>> # cephadm ls
>>
>> is returning this non-existent OSDs.
>>
>> can you verify that `cephadm ls` on that host doesn't
>> print osd.355 ?
> 
> "cephadm ls" on the node does list this drive
> 
> {
>     "style": "cephadm:v1",
>     "name": "osd.355",
>     "fsid": "3614abcc-201c-11eb-995a-2794bcc75ae0",
>     "systemd_unit": "ceph-3614abcc-201c-11eb-995a-2794bcc75ae0@osd.355",
>     "enabled": true,
>     "state": "stopped",
>     "container_id": null,
>     "container_image_name":
> "goharbor.example.com/library/ceph/ceph:v15.2.5",
>     "container_image_id": null,
>     "version": null,
>     "started": null,
>     "created": "2021-01-20T09:53:22.229080",
>     "deployed": "2021-02-09T09:24:02.855576",
>     "configured": "2021-02-09T09:24:04.211587"
> }
> 
> 
> To resolve it, could I just remove it with "cephadm rm-daemon"?
> 

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Sebastian Wagner
Hi Kai,

looks like

$ ssh pech-hd-009
# cephadm ls

is returning this non-existent OSDs.

can you verify that `cephadm ls` on that host doesn't
print osd.355 ?

Best,
Sebastian

Am 11.03.21 um 12:16 schrieb Kai Stian Olstad:
> Before I started the upgrade the cluster was healthy but one
> OSD(osd.355) was down, can't remember if it was in or out.
> Upgrade was started with
>     ceph orch upgrade start --image
> goharbor.example.com/library/ceph/ceph:v15.2.9
> 
> The upgrade started but when Ceph tried to upgrade osd.355 it paused
> with the following messages:
> 
>     2021-03-11T09:15:35.638104+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Target is goharbor.example.com/library/ceph/ceph:v15.2.9 with id
> dfc48307963697ff48acd9dd6fda4a7a24017b9d8124f86c2
> a542b0802fe77ba
>     2021-03-11T09:15:35.639882+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Checking mgr daemons...
>     2021-03-11T09:15:35.644170+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> All mgr daemons are up to date.
>     2021-03-11T09:15:35.644376+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Checking mon daemons...
>     2021-03-11T09:15:35.647669+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> All mon daemons are up to date.
>     2021-03-11T09:15:35.647866+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Checking crash daemons...
>     2021-03-11T09:15:35.652035+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Setting container_image for all crash...
>     2021-03-11T09:15:35.653683+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> All crash daemons are up to date.
>     2021-03-11T09:15:35.653896+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Checking osd daemons...
>     2021-03-11T09:15:36.273345+ mgr.pech-mon-2.cjeiyc [INF] It is
> presumed safe to stop ['osd.355']
>     2021-03-11T09:15:36.273504+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> It is presumed safe to stop ['osd.355']
>     2021-03-11T09:15:36.273887+ mgr.pech-mon-2.cjeiyc [INF] Upgrade:
> Redeploying osd.355
>     2021-03-11T09:15:36.276673+ mgr.pech-mon-2.cjeiyc [ERR] Upgrade:
> Paused due to UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.355 on host
> pech-hd-009 failed.
> 
> 
> One of the first ting the upgrade did was to upgrade mon, so they are
> restarted and now the osd.355 no longer exist
> 
>     $ ceph osd info osd.355
>     Error EINVAL: osd.355 does not exist
> 
> But if I run a resume
>     ceph orch upgrade resume
> it still tries to upgrade osd.355, same message as above.
> 
> I tried to stop and start the upgrade again with
>     ceph orch upgrade stop
>     ceph orch upgrade start --image
> goharbor.example.com/library/ceph/ceph:v15.2.9
> it still tries to upgrade osd.355, with the same message as above.
> 
> Looking at the source code it looks like it get daemons to upgrade from
> mgr cache, so I restarted both mgr but still it tries to upgrade osd.355.
> 
> 
> Does anyone know how I can get the upgrade to continue?
> 
> -- 
> Kai Stian Olstad
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Kai Stian Olstad

On 11.03.2021 15:47, Sebastian Wagner wrote:

yes

Am 11.03.21 um 15:46 schrieb Kai Stian Olstad:


To resolve it, could I just remove it with "cephadm rm-daemon"?


That worked like a charm, and the upgrade is resumed.

Thank you Sebastian.

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm: Upgrade 15.2.5 -> 15.2.9 stops on non existing OSD

2021-03-11 Thread Kai Stian Olstad

Hi Sebastian

On 11.03.2021 13:13, Sebastian Wagner wrote:

looks like

$ ssh pech-hd-009
# cephadm ls

is returning this non-existent OSDs.

can you verify that `cephadm ls` on that host doesn't
print osd.355 ?


"cephadm ls" on the node does list this drive

{
"style": "cephadm:v1",
"name": "osd.355",
"fsid": "3614abcc-201c-11eb-995a-2794bcc75ae0",
"systemd_unit": "ceph-3614abcc-201c-11eb-995a-2794bcc75ae0@osd.355",
"enabled": true,
"state": "stopped",
"container_id": null,
"container_image_name": 
"goharbor.example.com/library/ceph/ceph:v15.2.5",

"container_image_id": null,
"version": null,
"started": null,
"created": "2021-01-20T09:53:22.229080",
"deployed": "2021-02-09T09:24:02.855576",
"configured": "2021-02-09T09:24:04.211587"
}


To resolve it, could I just remove it with "cephadm rm-daemon"?

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io