[ceph-users] Re: ceph orchestator pulls strange images from docker.io

2023-09-15 Thread Eugen Block

Hi,

someone else had a similar issue [1], to set the global container  
image you can run:


$ ceph config set global container_image my-registry:5000/ceph/ceph:v17.2.6

I usually change that as soon as a cluster is up and running or after  
an upgrade so there's no risk of pulling wrong container images (I  
assume in your case the local cephadm versions on the hosts differ and  
therefore each one pulls a different default image hard-coded in the  
cephadm binary).


You should probably be able to start a mgr daemon by changing the  
unit.run file temporarily and replace "CONTAINER_IMAGE" with a correct  
image version (stop the pod first):


CONTAINER_IMAGE=my-registry/ceph/ceph-quincy@v17.2.6 (this is just an  
example).


The same line contains another image reference which you should  
change. Then restart that pod (e. g. with systemctl), hopefully you'll  
have a MGR up and running to be able to use the orchestrator again.  
This procedure helped me in the past.


Regards,
Eugen

[1]  
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/THAH2JFQNB7B4BPUHTRDPGXJ75WPNSNK/


Zitat von Stefan Kooman :


On 15-09-2023 10:25, Stefan Kooman wrote:


I could just nuke the whole dev cluster, wipe all disks and start  
fresh after reinstalling the hosts, but as I have to adopt 17  
clusters to the orchestrator, I rather get some learnings from the  
not working thing 


There is actually a cephadm "kill it with fire" option to do that  
for you, but yeah, make sure you know how to fix it when things do  
not go according to plan. It all magically works, until it doesn't .



cephadm rm-cluster --fsid your-fsid-here --force

... ss a last resort (short of wipefs / shred on all disks).

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orchestator pulls strange images from docker.io

2023-09-15 Thread Stefan Kooman

On 15-09-2023 10:25, Stefan Kooman wrote:


I could just nuke the whole dev cluster, wipe all disks and start 
fresh after reinstalling the hosts, but as I have to adopt 17 clusters 
to the orchestrator, I rather get some learnings from the not working 
thing 


There is actually a cephadm "kill it with fire" option to do that for 
you, but yeah, make sure you know how to fix it when things do not go 
according to plan. It all magically works, until it doesn't .



cephadm rm-cluster --fsid your-fsid-here --force

... ss a last resort (short of wipefs / shred on all disks).

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orchestator pulls strange images from docker.io

2023-09-15 Thread Stefan Kooman

On 15-09-2023 09:21, Boris Behrens wrote:

Hi Stefan,

the cluster is running 17.6.2 through the board. The mentioned container 
with other version don't show in the ceph -s or ceph verions.

It looks like it is host related.
One host get the correct 17.2.6 images, one get the 16.2.11 images and 
the third one uses the 7.0.0-7183-g54142666 (whatever this is) images.


root@0cc47a6df330:~# ceph config-key get config/global/container_image
Error ENOENT:

root@0cc47a6df330:~# ceph config-key list |grep container_image
     "config-history/12/+mgr.0cc47a6df14e/container_image",
     "config-history/13/+mgr.0cc47aad8ce8/container_image",
     "config/mgr.0cc47a6df14e/container_image",
     "config/mgr.0cc47aad8ce8/container_image",

I've tried to set the detault image to ceph config-key set 
config/global/container_image 
quay.io/ceph/ceph:v17.2.6@sha256:6b0a24e3146d4723700ce6579d40e6016b2c63d9bf90422653f2d4caa49be232 

But I can not redeploy the mgr daemons, because there is no standby daemon.

root@0cc47a6df330:~# ceph orch redeploy mgr
Error EINVAL: Unable to schedule redeploy for mgr.0cc47aad8ce8: No 
standby MGR


But there should be:
root@0cc47a6df330:~# ceph orch ps
NAME                     HOST                             PORTS   STATUS 
         REFRESHED  AGE  MEM USE  MEM LIM  VERSION    IMAGE ID 
  CONTAINER ID
mgr.0cc47a6df14e.iltiot  0cc47a6df14e  *:9283  running (23s)    22s ago 
   2m    10.6M        -  16.2.11    de4b0b384ad4  0f31a162fa3e
mgr.0cc47aad8ce8         0cc47aad8ce8          running (16h)     8m ago 
  16h     591M        -  17.2.6     22cd8daf4d70  8145c63fdc44


I guess that one of the managers is not working correctly (probably the 
16.2.11 version). IIRC I have changed the image reference for a 
container (systemd unit files) once, when I managed to redeploy all 
containers with a non-working image (test setup). so first make sure 
what manager is actually running, then try to fix the other one by 
editing the relevant config for that container (point it to the same 
image as the running container). Pull necessary image first if need be. 
After you've got a standby manager up and running, you can redeploy the 
necessary daemons. Be careful ... there are commands that redeploy all 
daemons at the same time, you don't want to do that normally ;-).




root@0cc47a6df330:~# ceph orch ls
NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
mgr              2/2  8m ago     19h  0cc47a6df14e;0cc47a6df330;0cc47aad8ce8

I've also remove podman and containerd, kill all directories and then do 
a fresh reinstall of podman, which also did not work.
It's also strange that the daemons with the wonky version got an extra 
suffix.


If I would now how, I would happily nuke the whole orchestrator, podman 
and everything that goes along with it, and start over. In the end it is 
not that hard to start some mgr/mon daemons without podman, so I would 
be back to a classical cluster.
I tried this yesterday, but the daemons still use that very strange 
images and I just don't understand why.


I could just nuke the whole dev cluster, wipe all disks and start fresh 
after reinstalling the hosts, but as I have to adopt 17 clusters to the 
orchestrator, I rather get some learnings from the not working thing :)


There is actually a cephadm "kill it with fire" option to do that for 
you, but yeah, make sure you know how to fix it when things do not go 
according to plan. It all magically works, until it doesn't ;-).


Good luck, and keep us updated with any further challenges / progress.

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orchestator pulls strange images from docker.io

2023-09-15 Thread Boris Behrens
Hi Stefan,

the cluster is running 17.6.2 through the board. The mentioned container
with other version don't show in the ceph -s or ceph verions.
It looks like it is host related.
One host get the correct 17.2.6 images, one get the 16.2.11 images and the
third one uses the 7.0.0-7183-g54142666 (whatever this is) images.

root@0cc47a6df330:~# ceph config-key get config/global/container_image
Error ENOENT:

root@0cc47a6df330:~# ceph config-key list |grep container_image
"config-history/12/+mgr.0cc47a6df14e/container_image",
"config-history/13/+mgr.0cc47aad8ce8/container_image",
"config/mgr.0cc47a6df14e/container_image",
"config/mgr.0cc47aad8ce8/container_image",

I've tried to set the detault image to ceph config-key set
config/global/container_image
quay.io/ceph/ceph:v17.2.6@sha256:6b0a24e3146d4723700ce6579d40e6016b2c63d9bf90422653f2d4caa49be232
But I can not redeploy the mgr daemons, because there is no standby daemon.

root@0cc47a6df330:~# ceph orch redeploy mgr
Error EINVAL: Unable to schedule redeploy for mgr.0cc47aad8ce8: No standby
MGR

But there should be:
root@0cc47a6df330:~# ceph orch ps
NAME HOST PORTS   STATUS
  REFRESHED  AGE  MEM USE  MEM LIM  VERSIONIMAGE ID  CONTAINER
ID
mgr.0cc47a6df14e.iltiot  0cc47a6df14e  *:9283  running (23s)22s ago
2m10.6M-  16.2.11de4b0b384ad4  0f31a162fa3e
mgr.0cc47aad8ce8 0cc47aad8ce8  running (16h) 8m ago
 16h 591M-  17.2.6 22cd8daf4d70  8145c63fdc44

root@0cc47a6df330:~# ceph orch ls
NAME  PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
mgr  2/2  8m ago 19h  0cc47a6df14e;0cc47a6df330;0cc47aad8ce8

I've also remove podman and containerd, kill all directories and then do a
fresh reinstall of podman, which also did not work.
It's also strange that the daemons with the wonky version got an extra
suffix.

If I would now how, I would happily nuke the whole orchestrator, podman and
everything that goes along with it, and start over. In the end it is not
that hard to start some mgr/mon daemons without podman, so I would be back
to a classical cluster.
I tried this yesterday, but the daemons still use that very strange images
and I just don't understand why.

I could just nuke the whole dev cluster, wipe all disks and start fresh
after reinstalling the hosts, but as I have to adopt 17 clusters to the
orchestrator, I rather get some learnings from the not working thing :)

Am Fr., 15. Sept. 2023 um 08:26 Uhr schrieb Stefan Kooman :

> On 14-09-2023 17:49, Boris Behrens wrote:
> > Hi,
> > I currently try to adopt our stage cluster, some hosts just pull strange
> > images.
> >
> > root@0cc47a6df330:/var/lib/containers/storage/overlay-images# podman ps
> > CONTAINER ID  IMAGE   COMMAND
> >  CREATEDSTATUSPORTS   NAMES
> > a532c37ebe42  docker.io/ceph/daemon-base:latest-master-devel  -n
> > mgr.0cc47a6df3...  2 minutes ago  Up 2 minutes ago
> >   ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df330-fxrfyl
> >
> > root@0cc47a6df330:~# ceph orch ps
> > NAME HOST PORTS   STATUS
> >REFRESHED  AGE  MEM USE  MEM LIM  VERSIONIMAGE ID
> >   CONTAINER ID
> > mgr.0cc47a6df14e.vqizdz  0cc47a6df14e.f00f.gridscale.dev  *:9283
> running
> > (3m)  3m ago   3m10.8M-  16.2.11
> >   de4b0b384ad4  00b02cd82a1c
> > mgr.0cc47a6df330.iijety  0cc47a6df330.f00f.gridscale.dev  *:9283
> running
> > (5s)  2s ago   4s10.5M-  17.0.0-7183-g54142666
> >   75e3d7089cea  662c6baa097e
> > mgr.0cc47aad8ce8 0cc47aad8ce8.f00f.gridscale.dev
> running
> > (65m) 8m ago  60m 553M-  17.2.6
> > 22cd8daf4d70  8145c63fdc44
> >
> > Any idea what I need to do to change that?
>
> I want to get some things cleared up. What is the version you are
> running? I see three different ceph versions active now. I see you are
> running a podman ps command, but see docker images pulled. AFAIK podman
> needs a different IMAGE than docker ... or do you have a mixed setup?
>
> What does "ceph config-key get config/global/container_image" give you?
>
> ceph config-key list |grep container_image should give you a list
> (including config-history) where you can see what has been configured
> before.
>
> cephadm logs might give a clue as well.
>
> You can configure the IMAGE version / type that you want by setting the
> key and redeploy affected containers: For example (18.1.2):
>
> ceph config-key set config/global/container_image
>
> quay.io/ceph/ceph:v18.1.2@sha256:82a380c8127c42da406b7ce1281c2f3c0a86d4ba04b1f4b5f8d1036b8c24784f
>
> Gr. Stefan
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orchestator pulls strange images from docker.io

2023-09-15 Thread Marc
> > I currently try to adopt our stage cluster, some hosts just pull strange
> > images.
> >
> > root@0cc47a6df330:/var/lib/containers/storage/overlay-images# podman ps
> > CONTAINER ID  IMAGE   COMMAND
> >  CREATEDSTATUSPORTS   NAMES
> > a532c37ebe42  docker.io/ceph/daemon-base:latest-master-devel  -n
> > mgr.0cc47a6df3...  2 minutes ago  Up 2 minutes ago
> >   ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df330-fxrfyl
> >
> > root@0cc47a6df330:~# ceph orch ps
> > NAME HOST PORTS   STATUS
> >REFRESHED  AGE  MEM USE  MEM LIM  VERSIONIMAGE ID
> >   CONTAINER ID
> > mgr.0cc47a6df14e.vqizdz  0cc47a6df14e.f00f.gridscale.dev  *:9283  running
> > (3m)  3m ago   3m10.8M-  16.2.11
> >   de4b0b384ad4  00b02cd82a1c
> > mgr.0cc47a6df330.iijety  0cc47a6df330.f00f.gridscale.dev  *:9283  running
> > (5s)  2s ago   4s10.5M-  17.0.0-7183-g54142666
> >   75e3d7089cea  662c6baa097e
> > mgr.0cc47aad8ce8 0cc47aad8ce8.f00f.gridscale.dev  running
> > (65m) 8m ago  60m 553M-  17.2.6
> > 22cd8daf4d70  8145c63fdc44
> >
> > Any idea what I need to do to change that?
> 
> I want to get some things cleared up. What is the version you are
> running? I see three different ceph versions active now. I see you are
> running a podman ps command, but see docker images pulled. AFAIK podman
> needs a different IMAGE than docker ... or do you have a mixed setup?

Podman does not need different images. I think lots of CO use the docker image 
format. Afaik podman is mostly a fork of docker


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orchestator pulls strange images from docker.io

2023-09-15 Thread Stefan Kooman

On 14-09-2023 17:49, Boris Behrens wrote:

Hi,
I currently try to adopt our stage cluster, some hosts just pull strange
images.

root@0cc47a6df330:/var/lib/containers/storage/overlay-images# podman ps
CONTAINER ID  IMAGE   COMMAND
 CREATEDSTATUSPORTS   NAMES
a532c37ebe42  docker.io/ceph/daemon-base:latest-master-devel  -n
mgr.0cc47a6df3...  2 minutes ago  Up 2 minutes ago
  ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df330-fxrfyl

root@0cc47a6df330:~# ceph orch ps
NAME HOST PORTS   STATUS
   REFRESHED  AGE  MEM USE  MEM LIM  VERSIONIMAGE ID
  CONTAINER ID
mgr.0cc47a6df14e.vqizdz  0cc47a6df14e.f00f.gridscale.dev  *:9283  running
(3m)  3m ago   3m10.8M-  16.2.11
  de4b0b384ad4  00b02cd82a1c
mgr.0cc47a6df330.iijety  0cc47a6df330.f00f.gridscale.dev  *:9283  running
(5s)  2s ago   4s10.5M-  17.0.0-7183-g54142666
  75e3d7089cea  662c6baa097e
mgr.0cc47aad8ce8 0cc47aad8ce8.f00f.gridscale.dev  running
(65m) 8m ago  60m 553M-  17.2.6
22cd8daf4d70  8145c63fdc44

Any idea what I need to do to change that?


I want to get some things cleared up. What is the version you are 
running? I see three different ceph versions active now. I see you are 
running a podman ps command, but see docker images pulled. AFAIK podman 
needs a different IMAGE than docker ... or do you have a mixed setup?


What does "ceph config-key get config/global/container_image" give you?

ceph config-key list |grep container_image should give you a list 
(including config-history) where you can see what has been configured 
before.


cephadm logs might give a clue as well.

You can configure the IMAGE version / type that you want by setting the 
key and redeploy affected containers: For example (18.1.2):


ceph config-key set config/global/container_image 
quay.io/ceph/ceph:v18.1.2@sha256:82a380c8127c42da406b7ce1281c2f3c0a86d4ba04b1f4b5f8d1036b8c24784f


Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io