[ceph-users] Re: Missing keyrings on upgraded cluster
Okay, now I feel stupid but you helped pointing me to my mistake :-D While collecting the CLI output you mentioned I noticed that the disks have different sizes, the one I wanted to replace was only 15GB instead of 20GB as my drivegroup requires. Thanks for pointing that out, I have no idea why I chose different sizes :-D I will recreate the OSDs with identical disk sizes in order to have one applicable drivegroup for all hosts. Thanks again! Zitat von Adam King : so "ceph osd tree destroyed -f json-pretty" shows the nautilus2 host with the osd id you're trying to replace here? And there are disks marked available that match the spec (20G rotational disk in this case I guess) in "ceph orch device ls nautilus2"? On Mon, Feb 20, 2023 at 10:16 AM Eugen Block wrote: I stumbled upon this option 'osd_id_claims' [2], so I tried to apply a replace.yaml to redeploy only the one destroyed disk, but still nothing happens with that disk. This is my replace.yaml: ---snip--- nautilus:~ # cat replace-osd-7.yaml service_type: osd service_name: osd placement: hosts: - nautilus2 spec: data_devices: rotational: 1 size: '20G:' db_devices: rotational: 0 size: '13G:16G' filter_logic: AND objectstore: bluestore osd_id_claims: nautilus2: ['7'] ---snip--- I see these lines in the mgr.log: Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log [INF] : Found osd claims -> {'nautilus2': ['7']} Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: [cephadm INFO cephadm.services.osd] Found osd claims for drivegroup None -> {'nautilus2': ['7']} Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log [INF] : Found osd claims for drivegroup None -> {'nautilus2': ['7']} But I see no attempt to actually deploy the OSD. [2] https://docs.ceph.com/en/quincy/mgr/orchestrator_modules/#orchestrator-osd-replace Zitat von Adam King : > For reference, a stray daemon from cephadm POV is roughly just something > that shows up in "ceph node ls" that doesn't have a directory in > /var/lib/ceph/. I guess manually making the OSD as you did means that > didn't end up getting made. I remember the manual osd creation process (by > manual just meaning not using an orchestrator/cephadm mgr module command) > coming up at one point and the we ended up manually running "cephadm > deploy" to make sure those directories get created correctly, but I don't > think any docs ever got made about it (yet, anyway). Also, is there a > tracker issue for it not correctly handling the drivegroup? > > On Mon, Feb 20, 2023 at 8:58 AM Eugen Block wrote: > >> Thanks, Adam. >> >> Providing the keyring to the cephadm command worked, but the unwanted >> (but expected) side effect is that from cephadm perspective it's a >> stray daemon. For some reason the orchestrator did apply the desired >> drivegroup when I tried to reproduce this morning, but then again >> failed just now when I wanted to get rid of the stray daemon. This is >> one of the most annoying things with cephadm, I still don't fully >> understand when it will correctly apply the identical drivegroup.yml >> and when not. Anyway, the conclusion is to not interfere with cephadm >> (nothing new here), but since the drivegroup was not applied correctly >> I assumed I had to "help out" a bit by manually deploying an OSD. >> >> Thanks, >> Eugen >> >> Zitat von Adam King : >> >> > Going off of >> > >> > ceph --cluster ceph --name client.bootstrap-osd --keyring >> > /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json >> > >> > you could try passing "--keyring > > ceph-volume command. Something like 'cephadm ceph-volume --keyring >> > -- lvm create'. I'm guessing it's trying to run >> the >> > osd tree command within a container and I know cephadm mounts keyrings >> > passed to the ceph-volume command as >> > "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container. >> > >> > On Mon, Feb 20, 2023 at 6:35 AM Eugen Block wrote: >> > >> >> Hi *, >> >> >> >> I was playing around on an upgraded test cluster (from N to Q), >> >> current version: >> >> >> >> "overall": { >> >> "ceph version 17.2.5 >> >> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18 >> >> } >> >> >> >> I tried to replace an OSD after destroying it with 'ceph orch osd rm >> >> osd.5 --replace'. The OSD was drained successfully and marked as >> >> "destroyed" as expected, the zapping also worked. At this point I >> >> didn't have an osd spec in place because all OSDs were adopted during >> >> the upgrade process. So I created a new spec which was not applied >> >> successfully (I'm wondering if there's another/new issue with >> >> ceph-volume, but that's not the focus here), so I tried it manually >> >> with 'cephadm ceph-volume lvm create'. I'll add the output at the end >> >> for a better readability. Apparently, there's no boostrap-osd keyring >> >> for cephadm so it can't search the desired osd_id in the osd tree, the >> >> command it tri
[ceph-users] Re: Missing keyrings on upgraded cluster
so "ceph osd tree destroyed -f json-pretty" shows the nautilus2 host with the osd id you're trying to replace here? And there are disks marked available that match the spec (20G rotational disk in this case I guess) in "ceph orch device ls nautilus2"? On Mon, Feb 20, 2023 at 10:16 AM Eugen Block wrote: > I stumbled upon this option 'osd_id_claims' [2], so I tried to apply a > replace.yaml to redeploy only the one destroyed disk, but still > nothing happens with that disk. This is my replace.yaml: > > ---snip--- > nautilus:~ # cat replace-osd-7.yaml > service_type: osd > service_name: osd > placement: >hosts: >- nautilus2 > spec: >data_devices: > rotational: 1 > size: '20G:' >db_devices: > rotational: 0 > size: '13G:16G' >filter_logic: AND >objectstore: bluestore > osd_id_claims: >nautilus2: ['7'] > ---snip--- > > I see these lines in the mgr.log: > > Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log > [INF] : Found osd claims -> {'nautilus2': ['7']} > Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: [cephadm INFO > cephadm.services.osd] Found osd claims for drivegroup None -> > {'nautilus2': ['7']} > Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log > [INF] : Found osd claims for drivegroup None -> {'nautilus2': ['7']} > > But I see no attempt to actually deploy the OSD. > > [2] > > https://docs.ceph.com/en/quincy/mgr/orchestrator_modules/#orchestrator-osd-replace > > Zitat von Adam King : > > > For reference, a stray daemon from cephadm POV is roughly just something > > that shows up in "ceph node ls" that doesn't have a directory in > > /var/lib/ceph/. I guess manually making the OSD as you did means > that > > didn't end up getting made. I remember the manual osd creation process > (by > > manual just meaning not using an orchestrator/cephadm mgr module command) > > coming up at one point and the we ended up manually running "cephadm > > deploy" to make sure those directories get created correctly, but I don't > > think any docs ever got made about it (yet, anyway). Also, is there a > > tracker issue for it not correctly handling the drivegroup? > > > > On Mon, Feb 20, 2023 at 8:58 AM Eugen Block wrote: > > > >> Thanks, Adam. > >> > >> Providing the keyring to the cephadm command worked, but the unwanted > >> (but expected) side effect is that from cephadm perspective it's a > >> stray daemon. For some reason the orchestrator did apply the desired > >> drivegroup when I tried to reproduce this morning, but then again > >> failed just now when I wanted to get rid of the stray daemon. This is > >> one of the most annoying things with cephadm, I still don't fully > >> understand when it will correctly apply the identical drivegroup.yml > >> and when not. Anyway, the conclusion is to not interfere with cephadm > >> (nothing new here), but since the drivegroup was not applied correctly > >> I assumed I had to "help out" a bit by manually deploying an OSD. > >> > >> Thanks, > >> Eugen > >> > >> Zitat von Adam King : > >> > >> > Going off of > >> > > >> > ceph --cluster ceph --name client.bootstrap-osd --keyring > >> > /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json > >> > > >> > you could try passing "--keyring cephadm > >> > ceph-volume command. Something like 'cephadm ceph-volume --keyring > >> > -- lvm create'. I'm guessing it's trying to > run > >> the > >> > osd tree command within a container and I know cephadm mounts keyrings > >> > passed to the ceph-volume command as > >> > "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container. > >> > > >> > On Mon, Feb 20, 2023 at 6:35 AM Eugen Block wrote: > >> > > >> >> Hi *, > >> >> > >> >> I was playing around on an upgraded test cluster (from N to Q), > >> >> current version: > >> >> > >> >> "overall": { > >> >> "ceph version 17.2.5 > >> >> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18 > >> >> } > >> >> > >> >> I tried to replace an OSD after destroying it with 'ceph orch osd rm > >> >> osd.5 --replace'. The OSD was drained successfully and marked as > >> >> "destroyed" as expected, the zapping also worked. At this point I > >> >> didn't have an osd spec in place because all OSDs were adopted during > >> >> the upgrade process. So I created a new spec which was not applied > >> >> successfully (I'm wondering if there's another/new issue with > >> >> ceph-volume, but that's not the focus here), so I tried it manually > >> >> with 'cephadm ceph-volume lvm create'. I'll add the output at the end > >> >> for a better readability. Apparently, there's no boostrap-osd keyring > >> >> for cephadm so it can't search the desired osd_id in the osd tree, > the > >> >> command it tries is this: > >> >> > >> >> ceph --cluster ceph --name client.bootstrap-osd --keyring > >> >> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json > >> >> > >> >> In the local filesystem the required keyring is present, though: > >> >> > >> >> nautilus:
[ceph-users] Re: Missing keyrings on upgraded cluster
I stumbled upon this option 'osd_id_claims' [2], so I tried to apply a replace.yaml to redeploy only the one destroyed disk, but still nothing happens with that disk. This is my replace.yaml: ---snip--- nautilus:~ # cat replace-osd-7.yaml service_type: osd service_name: osd placement: hosts: - nautilus2 spec: data_devices: rotational: 1 size: '20G:' db_devices: rotational: 0 size: '13G:16G' filter_logic: AND objectstore: bluestore osd_id_claims: nautilus2: ['7'] ---snip--- I see these lines in the mgr.log: Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log [INF] : Found osd claims -> {'nautilus2': ['7']} Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: [cephadm INFO cephadm.services.osd] Found osd claims for drivegroup None -> {'nautilus2': ['7']} Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log [INF] : Found osd claims for drivegroup None -> {'nautilus2': ['7']} But I see no attempt to actually deploy the OSD. [2] https://docs.ceph.com/en/quincy/mgr/orchestrator_modules/#orchestrator-osd-replace Zitat von Adam King : For reference, a stray daemon from cephadm POV is roughly just something that shows up in "ceph node ls" that doesn't have a directory in /var/lib/ceph/. I guess manually making the OSD as you did means that didn't end up getting made. I remember the manual osd creation process (by manual just meaning not using an orchestrator/cephadm mgr module command) coming up at one point and the we ended up manually running "cephadm deploy" to make sure those directories get created correctly, but I don't think any docs ever got made about it (yet, anyway). Also, is there a tracker issue for it not correctly handling the drivegroup? On Mon, Feb 20, 2023 at 8:58 AM Eugen Block wrote: Thanks, Adam. Providing the keyring to the cephadm command worked, but the unwanted (but expected) side effect is that from cephadm perspective it's a stray daemon. For some reason the orchestrator did apply the desired drivegroup when I tried to reproduce this morning, but then again failed just now when I wanted to get rid of the stray daemon. This is one of the most annoying things with cephadm, I still don't fully understand when it will correctly apply the identical drivegroup.yml and when not. Anyway, the conclusion is to not interfere with cephadm (nothing new here), but since the drivegroup was not applied correctly I assumed I had to "help out" a bit by manually deploying an OSD. Thanks, Eugen Zitat von Adam King : > Going off of > > ceph --cluster ceph --name client.bootstrap-osd --keyring > /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json > > you could try passing "--keyring ceph-volume command. Something like 'cephadm ceph-volume --keyring > -- lvm create'. I'm guessing it's trying to run the > osd tree command within a container and I know cephadm mounts keyrings > passed to the ceph-volume command as > "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container. > > On Mon, Feb 20, 2023 at 6:35 AM Eugen Block wrote: > >> Hi *, >> >> I was playing around on an upgraded test cluster (from N to Q), >> current version: >> >> "overall": { >> "ceph version 17.2.5 >> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18 >> } >> >> I tried to replace an OSD after destroying it with 'ceph orch osd rm >> osd.5 --replace'. The OSD was drained successfully and marked as >> "destroyed" as expected, the zapping also worked. At this point I >> didn't have an osd spec in place because all OSDs were adopted during >> the upgrade process. So I created a new spec which was not applied >> successfully (I'm wondering if there's another/new issue with >> ceph-volume, but that's not the focus here), so I tried it manually >> with 'cephadm ceph-volume lvm create'. I'll add the output at the end >> for a better readability. Apparently, there's no boostrap-osd keyring >> for cephadm so it can't search the desired osd_id in the osd tree, the >> command it tries is this: >> >> ceph --cluster ceph --name client.bootstrap-osd --keyring >> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json >> >> In the local filesystem the required keyring is present, though: >> >> nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring >> [client.bootstrap-osd] >> key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug== >> caps mgr = "allow r" >> caps mon = "profile bootstrap-osd" >> >> Is there something missing during the adoption process? Or are the >> docs lacking some upgrade info? I found a section about putting >> keyrings under management [1], but I'm not sure if that's what's >> missing here. >> Any insights are highly appreciated! >> >> Thanks, >> Eugen >> >> [1] >> >> https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under-management >> >> >> ---snip--- >> nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data /dev/sde >> --block.db /dev/sdb --block.db-size 5G
[ceph-users] Re: Missing keyrings on upgraded cluster
I haven't looked too closely for open tracker issues regarding ceph-volume, to be honest. I'm still not even sure if I'm doing something wrong or if it's an actual ceph issue. I still have a couple of OSDs left to play around in this cluster. So I tried it with a different OSD, it is showing up as "destroyed" in the osd tree, but the orchestrator isn't redeploying it although the osd disk and the corresponding block.db lv have been wiped. There's nothing in cephadm.log except the "check-host" and "gather-facts". If I would remove the destroyed OSD from the crushmap I'm sure it would be redeployed successfully, it was earlier. Any idea why it's not redeployed? Zitat von Adam King : For reference, a stray daemon from cephadm POV is roughly just something that shows up in "ceph node ls" that doesn't have a directory in /var/lib/ceph/. I guess manually making the OSD as you did means that didn't end up getting made. I remember the manual osd creation process (by manual just meaning not using an orchestrator/cephadm mgr module command) coming up at one point and the we ended up manually running "cephadm deploy" to make sure those directories get created correctly, but I don't think any docs ever got made about it (yet, anyway). Also, is there a tracker issue for it not correctly handling the drivegroup? On Mon, Feb 20, 2023 at 8:58 AM Eugen Block wrote: Thanks, Adam. Providing the keyring to the cephadm command worked, but the unwanted (but expected) side effect is that from cephadm perspective it's a stray daemon. For some reason the orchestrator did apply the desired drivegroup when I tried to reproduce this morning, but then again failed just now when I wanted to get rid of the stray daemon. This is one of the most annoying things with cephadm, I still don't fully understand when it will correctly apply the identical drivegroup.yml and when not. Anyway, the conclusion is to not interfere with cephadm (nothing new here), but since the drivegroup was not applied correctly I assumed I had to "help out" a bit by manually deploying an OSD. Thanks, Eugen Zitat von Adam King : > Going off of > > ceph --cluster ceph --name client.bootstrap-osd --keyring > /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json > > you could try passing "--keyring ceph-volume command. Something like 'cephadm ceph-volume --keyring > -- lvm create'. I'm guessing it's trying to run the > osd tree command within a container and I know cephadm mounts keyrings > passed to the ceph-volume command as > "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container. > > On Mon, Feb 20, 2023 at 6:35 AM Eugen Block wrote: > >> Hi *, >> >> I was playing around on an upgraded test cluster (from N to Q), >> current version: >> >> "overall": { >> "ceph version 17.2.5 >> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18 >> } >> >> I tried to replace an OSD after destroying it with 'ceph orch osd rm >> osd.5 --replace'. The OSD was drained successfully and marked as >> "destroyed" as expected, the zapping also worked. At this point I >> didn't have an osd spec in place because all OSDs were adopted during >> the upgrade process. So I created a new spec which was not applied >> successfully (I'm wondering if there's another/new issue with >> ceph-volume, but that's not the focus here), so I tried it manually >> with 'cephadm ceph-volume lvm create'. I'll add the output at the end >> for a better readability. Apparently, there's no boostrap-osd keyring >> for cephadm so it can't search the desired osd_id in the osd tree, the >> command it tries is this: >> >> ceph --cluster ceph --name client.bootstrap-osd --keyring >> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json >> >> In the local filesystem the required keyring is present, though: >> >> nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring >> [client.bootstrap-osd] >> key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug== >> caps mgr = "allow r" >> caps mon = "profile bootstrap-osd" >> >> Is there something missing during the adoption process? Or are the >> docs lacking some upgrade info? I found a section about putting >> keyrings under management [1], but I'm not sure if that's what's >> missing here. >> Any insights are highly appreciated! >> >> Thanks, >> Eugen >> >> [1] >> >> https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under-management >> >> >> ---snip--- >> nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data /dev/sde >> --block.db /dev/sdb --block.db-size 5G >> Inferring fsid >> Using recent ceph image >> /ceph/ceph@sha256 >> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 >> Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host >> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host >> --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk >> --init -e >> CONTAINER_IMAGE=/ceph/ceph@sha256 :af50ec26
[ceph-users] Re: Missing keyrings on upgraded cluster
For reference, a stray daemon from cephadm POV is roughly just something that shows up in "ceph node ls" that doesn't have a directory in /var/lib/ceph/. I guess manually making the OSD as you did means that didn't end up getting made. I remember the manual osd creation process (by manual just meaning not using an orchestrator/cephadm mgr module command) coming up at one point and the we ended up manually running "cephadm deploy" to make sure those directories get created correctly, but I don't think any docs ever got made about it (yet, anyway). Also, is there a tracker issue for it not correctly handling the drivegroup? On Mon, Feb 20, 2023 at 8:58 AM Eugen Block wrote: > Thanks, Adam. > > Providing the keyring to the cephadm command worked, but the unwanted > (but expected) side effect is that from cephadm perspective it's a > stray daemon. For some reason the orchestrator did apply the desired > drivegroup when I tried to reproduce this morning, but then again > failed just now when I wanted to get rid of the stray daemon. This is > one of the most annoying things with cephadm, I still don't fully > understand when it will correctly apply the identical drivegroup.yml > and when not. Anyway, the conclusion is to not interfere with cephadm > (nothing new here), but since the drivegroup was not applied correctly > I assumed I had to "help out" a bit by manually deploying an OSD. > > Thanks, > Eugen > > Zitat von Adam King : > > > Going off of > > > > ceph --cluster ceph --name client.bootstrap-osd --keyring > > /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json > > > > you could try passing "--keyring > ceph-volume command. Something like 'cephadm ceph-volume --keyring > > -- lvm create'. I'm guessing it's trying to run > the > > osd tree command within a container and I know cephadm mounts keyrings > > passed to the ceph-volume command as > > "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container. > > > > On Mon, Feb 20, 2023 at 6:35 AM Eugen Block wrote: > > > >> Hi *, > >> > >> I was playing around on an upgraded test cluster (from N to Q), > >> current version: > >> > >> "overall": { > >> "ceph version 17.2.5 > >> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18 > >> } > >> > >> I tried to replace an OSD after destroying it with 'ceph orch osd rm > >> osd.5 --replace'. The OSD was drained successfully and marked as > >> "destroyed" as expected, the zapping also worked. At this point I > >> didn't have an osd spec in place because all OSDs were adopted during > >> the upgrade process. So I created a new spec which was not applied > >> successfully (I'm wondering if there's another/new issue with > >> ceph-volume, but that's not the focus here), so I tried it manually > >> with 'cephadm ceph-volume lvm create'. I'll add the output at the end > >> for a better readability. Apparently, there's no boostrap-osd keyring > >> for cephadm so it can't search the desired osd_id in the osd tree, the > >> command it tries is this: > >> > >> ceph --cluster ceph --name client.bootstrap-osd --keyring > >> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json > >> > >> In the local filesystem the required keyring is present, though: > >> > >> nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring > >> [client.bootstrap-osd] > >> key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug== > >> caps mgr = "allow r" > >> caps mon = "profile bootstrap-osd" > >> > >> Is there something missing during the adoption process? Or are the > >> docs lacking some upgrade info? I found a section about putting > >> keyrings under management [1], but I'm not sure if that's what's > >> missing here. > >> Any insights are highly appreciated! > >> > >> Thanks, > >> Eugen > >> > >> [1] > >> > >> > https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under-management > >> > >> > >> ---snip--- > >> nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data /dev/sde > >> --block.db /dev/sdb --block.db-size 5G > >> Inferring fsid > >> Using recent ceph image > >> /ceph/ceph@sha256 > >> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 > >> Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host > >> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host > >> --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk > >> --init -e > >> CONTAINER_IMAGE=/ceph/ceph@sha256 > :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 > >> -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e > >> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v > >> /var/run/ceph/:/var/run/ceph:z -v > >> /var/log/ceph/:/var/log/ceph:z -v > >> /var/lib/ceph//crash:/var/lib/ceph/crash:z -v /dev:/dev -v > >> /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v > >> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v > >> /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z > >> /ceph/ceph@sha256 > :af50ec26db7ee177e1ec1b553a
[ceph-users] Re: Missing keyrings on upgraded cluster
Thanks, Adam. Providing the keyring to the cephadm command worked, but the unwanted (but expected) side effect is that from cephadm perspective it's a stray daemon. For some reason the orchestrator did apply the desired drivegroup when I tried to reproduce this morning, but then again failed just now when I wanted to get rid of the stray daemon. This is one of the most annoying things with cephadm, I still don't fully understand when it will correctly apply the identical drivegroup.yml and when not. Anyway, the conclusion is to not interfere with cephadm (nothing new here), but since the drivegroup was not applied correctly I assumed I had to "help out" a bit by manually deploying an OSD. Thanks, Eugen Zitat von Adam King : Going off of ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json you could try passing "--keyring -- lvm create'. I'm guessing it's trying to run the osd tree command within a container and I know cephadm mounts keyrings passed to the ceph-volume command as "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container. On Mon, Feb 20, 2023 at 6:35 AM Eugen Block wrote: Hi *, I was playing around on an upgraded test cluster (from N to Q), current version: "overall": { "ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18 } I tried to replace an OSD after destroying it with 'ceph orch osd rm osd.5 --replace'. The OSD was drained successfully and marked as "destroyed" as expected, the zapping also worked. At this point I didn't have an osd spec in place because all OSDs were adopted during the upgrade process. So I created a new spec which was not applied successfully (I'm wondering if there's another/new issue with ceph-volume, but that's not the focus here), so I tried it manually with 'cephadm ceph-volume lvm create'. I'll add the output at the end for a better readability. Apparently, there's no boostrap-osd keyring for cephadm so it can't search the desired osd_id in the osd tree, the command it tries is this: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json In the local filesystem the required keyring is present, though: nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring [client.bootstrap-osd] key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug== caps mgr = "allow r" caps mon = "profile bootstrap-osd" Is there something missing during the adoption process? Or are the docs lacking some upgrade info? I found a section about putting keyrings under management [1], but I'm not sure if that's what's missing here. Any insights are highly appreciated! Thanks, Eugen [1] https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under-management ---snip--- nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb --block.db-size 5G Inferring fsid Using recent ceph image /ceph/ceph@sha256 :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=/ceph/ceph@sha256:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/:/var/run/ceph:z -v /var/log/ceph/:/var/log/ceph:z -v /var/lib/ceph//crash:/var/lib/ceph/crash:z -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z /ceph/ceph@sha256:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb --block.db-size 5G /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" level=warning msg="Path \"/etc/SUSEConnect\" from \"/etc/containers/mounts.conf\" doesn't exist, skipping" /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" level=warning msg="Path \"/etc/zypp/credentials.d/SCCcredentials\" from \"/etc/containers/mounts.conf\" doesn't exist, skipping" /usr/bin/podman: stderr Running command: /usr/bin/ceph-authtool --gen-print-key /usr/bin/podman: stderr Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.848+ 7fd255e30700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: (2) No such file or directory /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.848+ 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.key
[ceph-users] Re: Missing keyrings on upgraded cluster
Going off of ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json you could try passing "--keyring -- lvm create'. I'm guessing it's trying to run the osd tree command within a container and I know cephadm mounts keyrings passed to the ceph-volume command as "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container. On Mon, Feb 20, 2023 at 6:35 AM Eugen Block wrote: > Hi *, > > I was playing around on an upgraded test cluster (from N to Q), > current version: > > "overall": { > "ceph version 17.2.5 > (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18 > } > > I tried to replace an OSD after destroying it with 'ceph orch osd rm > osd.5 --replace'. The OSD was drained successfully and marked as > "destroyed" as expected, the zapping also worked. At this point I > didn't have an osd spec in place because all OSDs were adopted during > the upgrade process. So I created a new spec which was not applied > successfully (I'm wondering if there's another/new issue with > ceph-volume, but that's not the focus here), so I tried it manually > with 'cephadm ceph-volume lvm create'. I'll add the output at the end > for a better readability. Apparently, there's no boostrap-osd keyring > for cephadm so it can't search the desired osd_id in the osd tree, the > command it tries is this: > > ceph --cluster ceph --name client.bootstrap-osd --keyring > /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json > > In the local filesystem the required keyring is present, though: > > nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring > [client.bootstrap-osd] > key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug== > caps mgr = "allow r" > caps mon = "profile bootstrap-osd" > > Is there something missing during the adoption process? Or are the > docs lacking some upgrade info? I found a section about putting > keyrings under management [1], but I'm not sure if that's what's > missing here. > Any insights are highly appreciated! > > Thanks, > Eugen > > [1] > > https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under-management > > > ---snip--- > nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data /dev/sde > --block.db /dev/sdb --block.db-size 5G > Inferring fsid > Using recent ceph image > /ceph/ceph@sha256 > :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 > Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host > --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host > --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk > --init -e > CONTAINER_IMAGE=/ceph/ceph@sha256:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 > -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e > CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v > /var/run/ceph/:/var/run/ceph:z -v > /var/log/ceph/:/var/log/ceph:z -v > /var/lib/ceph//crash:/var/lib/ceph/crash:z -v /dev:/dev -v > /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v > /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v > /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z > /ceph/ceph@sha256:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92 > lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb --block.db-size > 5G > /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" level=warning > msg="Path \"/etc/SUSEConnect\" from \"/etc/containers/mounts.conf\" > doesn't exist, skipping" > /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" level=warning > msg="Path \"/etc/zypp/credentials.d/SCCcredentials\" from > \"/etc/containers/mounts.conf\" doesn't exist, skipping" > /usr/bin/podman: stderr Running command: /usr/bin/ceph-authtool > --gen-print-key > /usr/bin/podman: stderr Running command: /usr/bin/ceph --cluster ceph > --name client.bootstrap-osd --keyring > /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json > /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.848+ > 7fd255e30700 -1 auth: unable to find a keyring on > /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin: > (2) No such file or > directory > /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.848+ > 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at > /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin, > disabling > cephx > /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.852+ > 7fd255e30700 -1 auth: unable to find a keyring on > /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory > /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.852+ > 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx > /usr/bin/podman: stderr stderr: 2023-02-20T08:02:50.856+ > 7fd255e30700 -1 auth: unable to find a keyring on > /var/lib/ceph/bootstrap-osd/c