[ceph-users] Re: Missing keyrings on upgraded cluster

2023-02-20 Thread Eugen Block
Okay, now I feel stupid but you helped pointing me to my mistake :-D  
While collecting the CLI output you mentioned I noticed that the disks  
have different sizes, the one I wanted to replace was only 15GB  
instead of 20GB as my drivegroup requires. Thanks for pointing that  
out, I have no idea why I chose different sizes :-D
I will recreate the OSDs with identical disk sizes in order to have  
one applicable drivegroup for all hosts.

Thanks again!

Zitat von Adam King :


so "ceph osd tree destroyed -f json-pretty" shows the nautilus2 host with
the osd id you're trying to replace here? And there are disks marked
available that match the spec (20G rotational disk in this case I guess) in
"ceph orch device ls nautilus2"?

On Mon, Feb 20, 2023 at 10:16 AM Eugen Block  wrote:


I stumbled upon this option 'osd_id_claims' [2], so I tried to apply a
replace.yaml to redeploy only the one destroyed disk, but still
nothing happens with that disk. This is my replace.yaml:

---snip---
nautilus:~ # cat replace-osd-7.yaml
service_type: osd
service_name: osd
placement:
   hosts:
   - nautilus2
spec:
   data_devices:
 rotational: 1
 size: '20G:'
   db_devices:
 rotational: 0
 size: '13G:16G'
   filter_logic: AND
   objectstore: bluestore
osd_id_claims:
   nautilus2: ['7']
---snip---

I see these lines in the mgr.log:

Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log
[INF] : Found osd claims -> {'nautilus2': ['7']}
Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: [cephadm INFO
cephadm.services.osd] Found osd claims for drivegroup None ->
{'nautilus2': ['7']}
Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log
[INF] : Found osd claims for drivegroup None -> {'nautilus2': ['7']}

But I see no attempt to actually deploy the OSD.

[2]

https://docs.ceph.com/en/quincy/mgr/orchestrator_modules/#orchestrator-osd-replace

Zitat von Adam King :

> For reference, a stray daemon from cephadm POV is roughly just something
> that shows up in "ceph node ls" that doesn't have a directory in
> /var/lib/ceph/. I guess manually making the OSD as you did means
that
> didn't end up getting made. I remember the manual osd creation process
(by
> manual just meaning not using an orchestrator/cephadm mgr module command)
> coming up at one point and the we ended up manually running "cephadm
> deploy" to make sure those directories get created correctly, but I don't
> think any docs ever got made about it (yet, anyway). Also, is there a
> tracker issue for it not correctly handling the drivegroup?
>
> On Mon, Feb 20, 2023 at 8:58 AM Eugen Block  wrote:
>
>> Thanks, Adam.
>>
>> Providing the keyring to the cephadm command worked, but the unwanted
>> (but expected) side effect is that from cephadm perspective it's a
>> stray daemon. For some reason the orchestrator did apply the desired
>> drivegroup when I tried to reproduce this morning, but then again
>> failed just now when I wanted to get rid of the stray daemon. This is
>> one of the most annoying things with cephadm, I still don't fully
>> understand when it will correctly apply the identical drivegroup.yml
>> and when not. Anyway, the conclusion is to not interfere with cephadm
>> (nothing new here), but since the drivegroup was not applied correctly
>> I assumed I had to "help out" a bit by manually deploying an OSD.
>>
>> Thanks,
>> Eugen
>>
>> Zitat von Adam King :
>>
>> > Going off of
>> >
>> > ceph --cluster ceph --name client.bootstrap-osd --keyring
>> > /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>> >
>> > you could try passing "--keyring > > ceph-volume command. Something like  'cephadm ceph-volume --keyring
>> >  -- lvm create'. I'm guessing it's trying to
run
>> the
>> > osd tree command within a container and I know cephadm mounts keyrings
>> > passed to the ceph-volume command as
>> > "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container.
>> >
>> > On Mon, Feb 20, 2023 at 6:35 AM Eugen Block  wrote:
>> >
>> >> Hi *,
>> >>
>> >> I was playing around on an upgraded test cluster (from N to Q),
>> >> current version:
>> >>
>> >>  "overall": {
>> >>  "ceph version 17.2.5
>> >> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18
>> >>  }
>> >>
>> >> I tried to replace an OSD after destroying it with 'ceph orch osd rm
>> >> osd.5 --replace'. The OSD was drained successfully and marked as
>> >> "destroyed" as expected, the zapping also worked. At this point I
>> >> didn't have an osd spec in place because all OSDs were adopted during
>> >> the upgrade process. So I created a new spec which was not applied
>> >> successfully (I'm wondering if there's another/new issue with
>> >> ceph-volume, but that's not the focus here), so I tried it manually
>> >> with 'cephadm ceph-volume lvm create'. I'll add the output at the end
>> >> for a better readability. Apparently, there's no boostrap-osd keyring
>> >> for cephadm so it can't search the desired osd_id in the osd tree,
the
>> >> command it tri

[ceph-users] Re: Missing keyrings on upgraded cluster

2023-02-20 Thread Adam King
so "ceph osd tree destroyed -f json-pretty" shows the nautilus2 host with
the osd id you're trying to replace here? And there are disks marked
available that match the spec (20G rotational disk in this case I guess) in
"ceph orch device ls nautilus2"?

On Mon, Feb 20, 2023 at 10:16 AM Eugen Block  wrote:

> I stumbled upon this option 'osd_id_claims' [2], so I tried to apply a
> replace.yaml to redeploy only the one destroyed disk, but still
> nothing happens with that disk. This is my replace.yaml:
>
> ---snip---
> nautilus:~ # cat replace-osd-7.yaml
> service_type: osd
> service_name: osd
> placement:
>hosts:
>- nautilus2
> spec:
>data_devices:
>  rotational: 1
>  size: '20G:'
>db_devices:
>  rotational: 0
>  size: '13G:16G'
>filter_logic: AND
>objectstore: bluestore
> osd_id_claims:
>nautilus2: ['7']
> ---snip---
>
> I see these lines in the mgr.log:
>
> Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log
> [INF] : Found osd claims -> {'nautilus2': ['7']}
> Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: [cephadm INFO
> cephadm.services.osd] Found osd claims for drivegroup None ->
> {'nautilus2': ['7']}
> Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log
> [INF] : Found osd claims for drivegroup None -> {'nautilus2': ['7']}
>
> But I see no attempt to actually deploy the OSD.
>
> [2]
>
> https://docs.ceph.com/en/quincy/mgr/orchestrator_modules/#orchestrator-osd-replace
>
> Zitat von Adam King :
>
> > For reference, a stray daemon from cephadm POV is roughly just something
> > that shows up in "ceph node ls" that doesn't have a directory in
> > /var/lib/ceph/. I guess manually making the OSD as you did means
> that
> > didn't end up getting made. I remember the manual osd creation process
> (by
> > manual just meaning not using an orchestrator/cephadm mgr module command)
> > coming up at one point and the we ended up manually running "cephadm
> > deploy" to make sure those directories get created correctly, but I don't
> > think any docs ever got made about it (yet, anyway). Also, is there a
> > tracker issue for it not correctly handling the drivegroup?
> >
> > On Mon, Feb 20, 2023 at 8:58 AM Eugen Block  wrote:
> >
> >> Thanks, Adam.
> >>
> >> Providing the keyring to the cephadm command worked, but the unwanted
> >> (but expected) side effect is that from cephadm perspective it's a
> >> stray daemon. For some reason the orchestrator did apply the desired
> >> drivegroup when I tried to reproduce this morning, but then again
> >> failed just now when I wanted to get rid of the stray daemon. This is
> >> one of the most annoying things with cephadm, I still don't fully
> >> understand when it will correctly apply the identical drivegroup.yml
> >> and when not. Anyway, the conclusion is to not interfere with cephadm
> >> (nothing new here), but since the drivegroup was not applied correctly
> >> I assumed I had to "help out" a bit by manually deploying an OSD.
> >>
> >> Thanks,
> >> Eugen
> >>
> >> Zitat von Adam King :
> >>
> >> > Going off of
> >> >
> >> > ceph --cluster ceph --name client.bootstrap-osd --keyring
> >> > /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
> >> >
> >> > you could try passing "--keyring  cephadm
> >> > ceph-volume command. Something like  'cephadm ceph-volume --keyring
> >> >  -- lvm create'. I'm guessing it's trying to
> run
> >> the
> >> > osd tree command within a container and I know cephadm mounts keyrings
> >> > passed to the ceph-volume command as
> >> > "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container.
> >> >
> >> > On Mon, Feb 20, 2023 at 6:35 AM Eugen Block  wrote:
> >> >
> >> >> Hi *,
> >> >>
> >> >> I was playing around on an upgraded test cluster (from N to Q),
> >> >> current version:
> >> >>
> >> >>  "overall": {
> >> >>  "ceph version 17.2.5
> >> >> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18
> >> >>  }
> >> >>
> >> >> I tried to replace an OSD after destroying it with 'ceph orch osd rm
> >> >> osd.5 --replace'. The OSD was drained successfully and marked as
> >> >> "destroyed" as expected, the zapping also worked. At this point I
> >> >> didn't have an osd spec in place because all OSDs were adopted during
> >> >> the upgrade process. So I created a new spec which was not applied
> >> >> successfully (I'm wondering if there's another/new issue with
> >> >> ceph-volume, but that's not the focus here), so I tried it manually
> >> >> with 'cephadm ceph-volume lvm create'. I'll add the output at the end
> >> >> for a better readability. Apparently, there's no boostrap-osd keyring
> >> >> for cephadm so it can't search the desired osd_id in the osd tree,
> the
> >> >> command it tries is this:
> >> >>
> >> >> ceph --cluster ceph --name client.bootstrap-osd --keyring
> >> >> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
> >> >>
> >> >> In the local filesystem the required keyring is present, though:
> >> >>
> >> >> nautilus:

[ceph-users] Re: Missing keyrings on upgraded cluster

2023-02-20 Thread Eugen Block
I stumbled upon this option 'osd_id_claims' [2], so I tried to apply a  
replace.yaml to redeploy only the one destroyed disk, but still  
nothing happens with that disk. This is my replace.yaml:


---snip---
nautilus:~ # cat replace-osd-7.yaml
service_type: osd
service_name: osd
placement:
  hosts:
  - nautilus2
spec:
  data_devices:
rotational: 1
size: '20G:'
  db_devices:
rotational: 0
size: '13G:16G'
  filter_logic: AND
  objectstore: bluestore
osd_id_claims:
  nautilus2: ['7']
---snip---

I see these lines in the mgr.log:

Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log  
[INF] : Found osd claims -> {'nautilus2': ['7']}
Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: [cephadm INFO  
cephadm.services.osd] Found osd claims for drivegroup None ->  
{'nautilus2': ['7']}
Feb 20 16:09:03 nautilus3 ceph-mgr[2994]: log_channel(cephadm) log  
[INF] : Found osd claims for drivegroup None -> {'nautilus2': ['7']}


But I see no attempt to actually deploy the OSD.

[2]  
https://docs.ceph.com/en/quincy/mgr/orchestrator_modules/#orchestrator-osd-replace


Zitat von Adam King :


For reference, a stray daemon from cephadm POV is roughly just something
that shows up in "ceph node ls" that doesn't have a directory in
/var/lib/ceph/. I guess manually making the OSD as you did means that
didn't end up getting made. I remember the manual osd creation process (by
manual just meaning not using an orchestrator/cephadm mgr module command)
coming up at one point and the we ended up manually running "cephadm
deploy" to make sure those directories get created correctly, but I don't
think any docs ever got made about it (yet, anyway). Also, is there a
tracker issue for it not correctly handling the drivegroup?

On Mon, Feb 20, 2023 at 8:58 AM Eugen Block  wrote:


Thanks, Adam.

Providing the keyring to the cephadm command worked, but the unwanted
(but expected) side effect is that from cephadm perspective it's a
stray daemon. For some reason the orchestrator did apply the desired
drivegroup when I tried to reproduce this morning, but then again
failed just now when I wanted to get rid of the stray daemon. This is
one of the most annoying things with cephadm, I still don't fully
understand when it will correctly apply the identical drivegroup.yml
and when not. Anyway, the conclusion is to not interfere with cephadm
(nothing new here), but since the drivegroup was not applied correctly
I assumed I had to "help out" a bit by manually deploying an OSD.

Thanks,
Eugen

Zitat von Adam King :

> Going off of
>
> ceph --cluster ceph --name client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>
> you could try passing "--keyring  ceph-volume command. Something like  'cephadm ceph-volume --keyring
>  -- lvm create'. I'm guessing it's trying to run
the
> osd tree command within a container and I know cephadm mounts keyrings
> passed to the ceph-volume command as
> "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container.
>
> On Mon, Feb 20, 2023 at 6:35 AM Eugen Block  wrote:
>
>> Hi *,
>>
>> I was playing around on an upgraded test cluster (from N to Q),
>> current version:
>>
>>  "overall": {
>>  "ceph version 17.2.5
>> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18
>>  }
>>
>> I tried to replace an OSD after destroying it with 'ceph orch osd rm
>> osd.5 --replace'. The OSD was drained successfully and marked as
>> "destroyed" as expected, the zapping also worked. At this point I
>> didn't have an osd spec in place because all OSDs were adopted during
>> the upgrade process. So I created a new spec which was not applied
>> successfully (I'm wondering if there's another/new issue with
>> ceph-volume, but that's not the focus here), so I tried it manually
>> with 'cephadm ceph-volume lvm create'. I'll add the output at the end
>> for a better readability. Apparently, there's no boostrap-osd keyring
>> for cephadm so it can't search the desired osd_id in the osd tree, the
>> command it tries is this:
>>
>> ceph --cluster ceph --name client.bootstrap-osd --keyring
>> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>>
>> In the local filesystem the required keyring is present, though:
>>
>> nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring
>> [client.bootstrap-osd]
>>  key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug==
>>  caps mgr = "allow r"
>>  caps mon = "profile bootstrap-osd"
>>
>> Is there something missing during the adoption process? Or are the
>> docs lacking some upgrade info? I found a section about putting
>> keyrings under management [1], but I'm not sure if that's what's
>> missing here.
>> Any insights are highly appreciated!
>>
>> Thanks,
>> Eugen
>>
>> [1]
>>
>>
https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under-management
>>
>>
>> ---snip---
>> nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data /dev/sde
>> --block.db /dev/sdb --block.db-size 5G

[ceph-users] Re: Missing keyrings on upgraded cluster

2023-02-20 Thread Eugen Block
I haven't looked too closely for open tracker issues regarding  
ceph-volume, to be honest. I'm still not even sure if I'm doing  
something wrong or if it's an actual ceph issue. I still have a couple  
of OSDs left to play around in this cluster. So I tried it with a  
different OSD, it is showing up as "destroyed" in the osd tree, but  
the orchestrator isn't redeploying it although the osd disk and the  
corresponding block.db lv have been wiped. There's nothing in  
cephadm.log except the "check-host" and "gather-facts". If I would  
remove the destroyed OSD from the crushmap I'm sure it would be  
redeployed successfully, it was earlier. Any idea why it's not  
redeployed?


Zitat von Adam King :


For reference, a stray daemon from cephadm POV is roughly just something
that shows up in "ceph node ls" that doesn't have a directory in
/var/lib/ceph/. I guess manually making the OSD as you did means that
didn't end up getting made. I remember the manual osd creation process (by
manual just meaning not using an orchestrator/cephadm mgr module command)
coming up at one point and the we ended up manually running "cephadm
deploy" to make sure those directories get created correctly, but I don't
think any docs ever got made about it (yet, anyway). Also, is there a
tracker issue for it not correctly handling the drivegroup?

On Mon, Feb 20, 2023 at 8:58 AM Eugen Block  wrote:


Thanks, Adam.

Providing the keyring to the cephadm command worked, but the unwanted
(but expected) side effect is that from cephadm perspective it's a
stray daemon. For some reason the orchestrator did apply the desired
drivegroup when I tried to reproduce this morning, but then again
failed just now when I wanted to get rid of the stray daemon. This is
one of the most annoying things with cephadm, I still don't fully
understand when it will correctly apply the identical drivegroup.yml
and when not. Anyway, the conclusion is to not interfere with cephadm
(nothing new here), but since the drivegroup was not applied correctly
I assumed I had to "help out" a bit by manually deploying an OSD.

Thanks,
Eugen

Zitat von Adam King :

> Going off of
>
> ceph --cluster ceph --name client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>
> you could try passing "--keyring  ceph-volume command. Something like  'cephadm ceph-volume --keyring
>  -- lvm create'. I'm guessing it's trying to run
the
> osd tree command within a container and I know cephadm mounts keyrings
> passed to the ceph-volume command as
> "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container.
>
> On Mon, Feb 20, 2023 at 6:35 AM Eugen Block  wrote:
>
>> Hi *,
>>
>> I was playing around on an upgraded test cluster (from N to Q),
>> current version:
>>
>>  "overall": {
>>  "ceph version 17.2.5
>> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18
>>  }
>>
>> I tried to replace an OSD after destroying it with 'ceph orch osd rm
>> osd.5 --replace'. The OSD was drained successfully and marked as
>> "destroyed" as expected, the zapping also worked. At this point I
>> didn't have an osd spec in place because all OSDs were adopted during
>> the upgrade process. So I created a new spec which was not applied
>> successfully (I'm wondering if there's another/new issue with
>> ceph-volume, but that's not the focus here), so I tried it manually
>> with 'cephadm ceph-volume lvm create'. I'll add the output at the end
>> for a better readability. Apparently, there's no boostrap-osd keyring
>> for cephadm so it can't search the desired osd_id in the osd tree, the
>> command it tries is this:
>>
>> ceph --cluster ceph --name client.bootstrap-osd --keyring
>> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>>
>> In the local filesystem the required keyring is present, though:
>>
>> nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring
>> [client.bootstrap-osd]
>>  key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug==
>>  caps mgr = "allow r"
>>  caps mon = "profile bootstrap-osd"
>>
>> Is there something missing during the adoption process? Or are the
>> docs lacking some upgrade info? I found a section about putting
>> keyrings under management [1], but I'm not sure if that's what's
>> missing here.
>> Any insights are highly appreciated!
>>
>> Thanks,
>> Eugen
>>
>> [1]
>>
>>
https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under-management
>>
>>
>> ---snip---
>> nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data /dev/sde
>> --block.db /dev/sdb --block.db-size 5G
>> Inferring fsid 
>> Using recent ceph image
>> /ceph/ceph@sha256
>> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
>> Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host
>> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host
>> --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
>> --init -e
>> CONTAINER_IMAGE=/ceph/ceph@sha256
:af50ec26

[ceph-users] Re: Missing keyrings on upgraded cluster

2023-02-20 Thread Adam King
For reference, a stray daemon from cephadm POV is roughly just something
that shows up in "ceph node ls" that doesn't have a directory in
/var/lib/ceph/. I guess manually making the OSD as you did means that
didn't end up getting made. I remember the manual osd creation process (by
manual just meaning not using an orchestrator/cephadm mgr module command)
coming up at one point and the we ended up manually running "cephadm
deploy" to make sure those directories get created correctly, but I don't
think any docs ever got made about it (yet, anyway). Also, is there a
tracker issue for it not correctly handling the drivegroup?

On Mon, Feb 20, 2023 at 8:58 AM Eugen Block  wrote:

> Thanks, Adam.
>
> Providing the keyring to the cephadm command worked, but the unwanted
> (but expected) side effect is that from cephadm perspective it's a
> stray daemon. For some reason the orchestrator did apply the desired
> drivegroup when I tried to reproduce this morning, but then again
> failed just now when I wanted to get rid of the stray daemon. This is
> one of the most annoying things with cephadm, I still don't fully
> understand when it will correctly apply the identical drivegroup.yml
> and when not. Anyway, the conclusion is to not interfere with cephadm
> (nothing new here), but since the drivegroup was not applied correctly
> I assumed I had to "help out" a bit by manually deploying an OSD.
>
> Thanks,
> Eugen
>
> Zitat von Adam King :
>
> > Going off of
> >
> > ceph --cluster ceph --name client.bootstrap-osd --keyring
> > /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
> >
> > you could try passing "--keyring  > ceph-volume command. Something like  'cephadm ceph-volume --keyring
> >  -- lvm create'. I'm guessing it's trying to run
> the
> > osd tree command within a container and I know cephadm mounts keyrings
> > passed to the ceph-volume command as
> > "/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container.
> >
> > On Mon, Feb 20, 2023 at 6:35 AM Eugen Block  wrote:
> >
> >> Hi *,
> >>
> >> I was playing around on an upgraded test cluster (from N to Q),
> >> current version:
> >>
> >>  "overall": {
> >>  "ceph version 17.2.5
> >> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18
> >>  }
> >>
> >> I tried to replace an OSD after destroying it with 'ceph orch osd rm
> >> osd.5 --replace'. The OSD was drained successfully and marked as
> >> "destroyed" as expected, the zapping also worked. At this point I
> >> didn't have an osd spec in place because all OSDs were adopted during
> >> the upgrade process. So I created a new spec which was not applied
> >> successfully (I'm wondering if there's another/new issue with
> >> ceph-volume, but that's not the focus here), so I tried it manually
> >> with 'cephadm ceph-volume lvm create'. I'll add the output at the end
> >> for a better readability. Apparently, there's no boostrap-osd keyring
> >> for cephadm so it can't search the desired osd_id in the osd tree, the
> >> command it tries is this:
> >>
> >> ceph --cluster ceph --name client.bootstrap-osd --keyring
> >> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
> >>
> >> In the local filesystem the required keyring is present, though:
> >>
> >> nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring
> >> [client.bootstrap-osd]
> >>  key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug==
> >>  caps mgr = "allow r"
> >>  caps mon = "profile bootstrap-osd"
> >>
> >> Is there something missing during the adoption process? Or are the
> >> docs lacking some upgrade info? I found a section about putting
> >> keyrings under management [1], but I'm not sure if that's what's
> >> missing here.
> >> Any insights are highly appreciated!
> >>
> >> Thanks,
> >> Eugen
> >>
> >> [1]
> >>
> >>
> https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under-management
> >>
> >>
> >> ---snip---
> >> nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data /dev/sde
> >> --block.db /dev/sdb --block.db-size 5G
> >> Inferring fsid 
> >> Using recent ceph image
> >> /ceph/ceph@sha256
> >> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> >> Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host
> >> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host
> >> --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
> >> --init -e
> >> CONTAINER_IMAGE=/ceph/ceph@sha256
> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> >> -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e
> >> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
> >> /var/run/ceph/:/var/run/ceph:z -v
> >> /var/log/ceph/:/var/log/ceph:z -v
> >> /var/lib/ceph//crash:/var/lib/ceph/crash:z -v /dev:/dev -v
> >> /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
> >> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
> >> /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z
> >> /ceph/ceph@sha256
> :af50ec26db7ee177e1ec1b553a

[ceph-users] Re: Missing keyrings on upgraded cluster

2023-02-20 Thread Eugen Block

Thanks, Adam.

Providing the keyring to the cephadm command worked, but the unwanted  
(but expected) side effect is that from cephadm perspective it's a  
stray daemon. For some reason the orchestrator did apply the desired  
drivegroup when I tried to reproduce this morning, but then again  
failed just now when I wanted to get rid of the stray daemon. This is  
one of the most annoying things with cephadm, I still don't fully  
understand when it will correctly apply the identical drivegroup.yml  
and when not. Anyway, the conclusion is to not interfere with cephadm  
(nothing new here), but since the drivegroup was not applied correctly  
I assumed I had to "help out" a bit by manually deploying an OSD.


Thanks,
Eugen

Zitat von Adam King :


Going off of

ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json

you could try passing "--keyring  -- lvm create'. I'm guessing it's trying to run the
osd tree command within a container and I know cephadm mounts keyrings
passed to the ceph-volume command as
"/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container.

On Mon, Feb 20, 2023 at 6:35 AM Eugen Block  wrote:


Hi *,

I was playing around on an upgraded test cluster (from N to Q),
current version:

 "overall": {
 "ceph version 17.2.5
(98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18
 }

I tried to replace an OSD after destroying it with 'ceph orch osd rm
osd.5 --replace'. The OSD was drained successfully and marked as
"destroyed" as expected, the zapping also worked. At this point I
didn't have an osd spec in place because all OSDs were adopted during
the upgrade process. So I created a new spec which was not applied
successfully (I'm wondering if there's another/new issue with
ceph-volume, but that's not the focus here), so I tried it manually
with 'cephadm ceph-volume lvm create'. I'll add the output at the end
for a better readability. Apparently, there's no boostrap-osd keyring
for cephadm so it can't search the desired osd_id in the osd tree, the
command it tries is this:

ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json

In the local filesystem the required keyring is present, though:

nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring
[client.bootstrap-osd]
 key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug==
 caps mgr = "allow r"
 caps mon = "profile bootstrap-osd"

Is there something missing during the adoption process? Or are the
docs lacking some upgrade info? I found a section about putting
keyrings under management [1], but I'm not sure if that's what's
missing here.
Any insights are highly appreciated!

Thanks,
Eugen

[1]

https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under-management


---snip---
nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data /dev/sde
--block.db /dev/sdb --block.db-size 5G
Inferring fsid 
Using recent ceph image
/ceph/ceph@sha256
:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host
--stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host
--entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
--init -e
CONTAINER_IMAGE=/ceph/ceph@sha256:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
-e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e
CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
/var/run/ceph/:/var/run/ceph:z -v
/var/log/ceph/:/var/log/ceph:z -v
/var/lib/ceph//crash:/var/lib/ceph/crash:z -v /dev:/dev -v
/run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
/run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
/tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z
/ceph/ceph@sha256:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb --block.db-size
5G
/usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" level=warning
msg="Path \"/etc/SUSEConnect\" from \"/etc/containers/mounts.conf\"
doesn't exist, skipping"
/usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" level=warning
msg="Path \"/etc/zypp/credentials.d/SCCcredentials\" from
\"/etc/containers/mounts.conf\" doesn't exist, skipping"
/usr/bin/podman: stderr Running command: /usr/bin/ceph-authtool
--gen-print-key
/usr/bin/podman: stderr Running command: /usr/bin/ceph --cluster ceph
--name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
/usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.848+
7fd255e30700 -1 auth: unable to find a keyring on
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
(2) No such file or
directory
/usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.848+
7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at
/etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.key

[ceph-users] Re: Missing keyrings on upgraded cluster

2023-02-20 Thread Adam King
Going off of

ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json

you could try passing "--keyring  -- lvm create'. I'm guessing it's trying to run the
osd tree command within a container and I know cephadm mounts keyrings
passed to the ceph-volume command as
"/var/lib/ceph/bootstrap-osd/ceph.keyring" inside the container.

On Mon, Feb 20, 2023 at 6:35 AM Eugen Block  wrote:

> Hi *,
>
> I was playing around on an upgraded test cluster (from N to Q),
> current version:
>
>  "overall": {
>  "ceph version 17.2.5
> (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)": 18
>  }
>
> I tried to replace an OSD after destroying it with 'ceph orch osd rm
> osd.5 --replace'. The OSD was drained successfully and marked as
> "destroyed" as expected, the zapping also worked. At this point I
> didn't have an osd spec in place because all OSDs were adopted during
> the upgrade process. So I created a new spec which was not applied
> successfully (I'm wondering if there's another/new issue with
> ceph-volume, but that's not the focus here), so I tried it manually
> with 'cephadm ceph-volume lvm create'. I'll add the output at the end
> for a better readability. Apparently, there's no boostrap-osd keyring
> for cephadm so it can't search the desired osd_id in the osd tree, the
> command it tries is this:
>
> ceph --cluster ceph --name client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
>
> In the local filesystem the required keyring is present, though:
>
> nautilus:~ # cat /var/lib/ceph/bootstrap-osd/ceph.keyring
> [client.bootstrap-osd]
>  key = AQBOCbpgixIsOBAAgBzShsFg/l1bOze4eTZHug==
>  caps mgr = "allow r"
>  caps mon = "profile bootstrap-osd"
>
> Is there something missing during the adoption process? Or are the
> docs lacking some upgrade info? I found a section about putting
> keyrings under management [1], but I'm not sure if that's what's
> missing here.
> Any insights are highly appreciated!
>
> Thanks,
> Eugen
>
> [1]
>
> https://docs.ceph.com/en/quincy/cephadm/operations/#putting-a-keyring-under-management
>
>
> ---snip---
> nautilus:~ # cephadm ceph-volume lvm create --osd-id 5 --data /dev/sde
> --block.db /dev/sdb --block.db-size 5G
> Inferring fsid 
> Using recent ceph image
> /ceph/ceph@sha256
> :af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host
> --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host
> --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk
> --init -e
> CONTAINER_IMAGE=/ceph/ceph@sha256:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> -e NODE_NAME=nautilus -e CEPH_USE_RANDOM_NONCE=1 -e
> CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v
> /var/run/ceph/:/var/run/ceph:z -v
> /var/log/ceph/:/var/log/ceph:z -v
> /var/lib/ceph//crash:/var/lib/ceph/crash:z -v /dev:/dev -v
> /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v
> /run/lock/lvm:/run/lock/lvm -v /:/rootfs -v
> /tmp/ceph-tmpuydvbhuk:/etc/ceph/ceph.conf:z
> /ceph/ceph@sha256:af50ec26db7ee177e1ec1b553a0d6a9dbad2c3cc0da2f8f46d012184a79d4f92
> lvm create --osd-id 5 --data /dev/sde --block.db /dev/sdb --block.db-size
> 5G
> /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" level=warning
> msg="Path \"/etc/SUSEConnect\" from \"/etc/containers/mounts.conf\"
> doesn't exist, skipping"
> /usr/bin/podman: stderr time="2023-02-20T09:02:49+01:00" level=warning
> msg="Path \"/etc/zypp/credentials.d/SCCcredentials\" from
> \"/etc/containers/mounts.conf\" doesn't exist, skipping"
> /usr/bin/podman: stderr Running command: /usr/bin/ceph-authtool
> --gen-print-key
> /usr/bin/podman: stderr Running command: /usr/bin/ceph --cluster ceph
> --name client.bootstrap-osd --keyring
> /var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
> /usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.848+
> 7fd255e30700 -1 auth: unable to find a keyring on
> /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin:
> (2) No such file or
> directory
> /usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.848+
> 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at
> /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,
> disabling
> cephx
> /usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.852+
> 7fd255e30700 -1 auth: unable to find a keyring on
> /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory
> /usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.852+
> 7fd255e30700 -1 AuthRegistry(0x7fd250060d50) no keyring found at
> /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
> /usr/bin/podman: stderr  stderr: 2023-02-20T08:02:50.856+
> 7fd255e30700 -1 auth: unable to find a keyring on
> /var/lib/ceph/bootstrap-osd/c