On Wed, Sep 11, 2019 at 11:17:47AM +0100, Matthew Vernon wrote:
>Hi,
>
>We keep finding part-made OSDs (they appear not attached to any host,
>and down and out; but still counting towards the number of OSDs); we
>never saw this with ceph-disk. On investigation, this is because
>ceph-volume lvm create makes the OSD (ID and auth at least) too early in
>the process and is then unable to roll-back cleanly (because the
>bootstrap-osd credential isn't allowed to remove OSDs).
>
>As an example (very truncated):
>
>Running command: /usr/bin/ceph --cluster ceph --name
>client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
>-i - osd new 20cea174-4c1b-4330-ad33-505a03156c33
>Running command: vgcreate --force --yes
>ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e /dev/sdbh
> stderr: Device /dev/sdbh not found (or ignored by filtering).
>  Unable to add physical volume '/dev/sdbh' to volume group
>'ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e'.
>--> Was unable to complete a new OSD, will rollback changes
>--> OSD will be fully purged from the cluster, because the ID was generated
>Running command: ceph osd purge osd.828 --yes-i-really-mean-it
> stderr: 2019-09-10 15:07:53.396528 7fbca2caf700 -1 auth: unable to find
>a keyring on
>/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
>(2) No such file or directory
> stderr: 2019-09-10 15:07:53.397318 7fbca2caf700 -1 monclient:
>authenticate NOTE: no keyring found; disabled cephx authentication
>2019-09-10 15:07:53.397334 7fbca2caf700  0 librados: client.admin
>authentication error (95) Operation not supported
>
>This is annoying to have to clear up, and it seems to me could be
>avoided by either:
>
>i) ceph-volume should (attempt to) set up the LVM volumes &c before
>making the new OSD id
>or
>ii) allow the bootstrap-osd credential to purge OSDs
>
>i) seems like clearly the better answer...?
Agreed. Would you mind opening a bug report on
https://tracker.ceph.com/projects/ceph-volume.

I have found other situation where a roll-back is working as it should, though
not with as much impact as this.
>
>Regards,
>
>Matthew
>




>_______________________________________________
>ceph-users mailing list
>ceph-us...@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Jan Fajerski
Senior Software Engineer Enterprise Storage
SUSE Software Solutions Germany GmbH
(HRB 247165, AG München)
Geschäftsführer: Felix Imendörffer
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to