This is the exact issue that I ran into when starting my bluestore conversion 
journey.

See my thread here: https://www.spinics.net/lists/ceph-users/msg41802.html 
<https://www.spinics.net/lists/ceph-users/msg41802.html>

Specifying --osd-id causes it to fail.

Below are my steps for OSD replace/migrate from filestore to bluestore.

BIG caveat here in that I am doing destructive replacement, in that I am not 
allowing my objects to be migrated off of the OSD I’m replacing before nuking 
it.
With 8TB drives it just takes way too long, and I trust my failure domains and 
other hardware to get me through the backfills.
So instead of 1) reading data off, writing data elsewhere 2) remove/re-add 3) 
reading data elsewhere, writing back on, I am taking step one out, and trusting 
my two other copies of the objects. Just wanted to clarify my steps.

I also set norecover and norebalance flags immediately prior to running these 
commands so that it doesn’t try to start moving data unnecessarily. Then when 
done, remove those flags, and let it backfill.

> systemctl stop ceph-osd@$ID.service
> ceph-osd -i $ID --flush-journal
> umount /var/lib/ceph/osd/ceph-$ID
> ceph-volume lvm zap /dev/$ID
> ceph osd crush remove osd.$ID
> ceph auth del osd.$ID
> ceph osd rm osd.$ID
> ceph-volume lvm create --bluestore --data /dev/$DATA --block.db /dev/$NVME

So essentially I fully remove the OSD from crush and the osdmap, and when I add 
the OSD back, like I would a new OSD, it fills in the numeric gap with the $ID 
it had before.

Hope this is helpful.
Been working well for me so far, doing 3 OSDs at a time (half of a failure 
domain).

Reed

> On Jan 26, 2018, at 10:01 AM, David <da...@visions.se> wrote:
> 
> 
> Hi!
> 
> On luminous 12.2.2
> 
> I'm migrating some OSDs from filestore to bluestore using the "simple" method 
> as described in docs: 
> http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/#convert-existing-osds
>  
> <http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/#convert-existing-osds>
> Mark out and Replace.
> 
> However, at 9.: ceph-volume create --bluestore --data $DEVICE --osd-id $ID
> it seems to create the bluestore but it fails to authenticate with the old 
> osd-id auth.
> (the command above is also missing lvm or simple)
> 
> I think it's related to this:
> http://tracker.ceph.com/issues/22642 <http://tracker.ceph.com/issues/22642>
> 
> # ceph-volume lvm create --bluestore --data /dev/sdc --osd-id 0
> Running command: sudo vgcreate --force --yes 
> ceph-efad7df8-721d-43d8-8d02-449406e70b90 /dev/sdc
>  stderr: WARNING: lvmetad is running but disabled. Restart lvmetad before 
> enabling it!
>  stdout: Physical volume "/dev/sdc" successfully created
>  stdout: Volume group "ceph-efad7df8-721d-43d8-8d02-449406e70b90" 
> successfully created
> Running command: sudo lvcreate --yes -l 100%FREE -n 
> osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9 
> ceph-efad7df8-721d-43d8-8d02-449406e70b90
>  stderr: WARNING: lvmetad is running but disabled. Restart lvmetad before 
> enabling it!
>  stdout: Logical volume "osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9" 
> created.
> Running command: sudo mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
> Running command: chown -R ceph:ceph /dev/dm-4
> Running command: sudo ln -s 
> /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9
>  /var/lib/ceph/osd/ceph-0/block
> Running command: sudo ceph --cluster ceph --name client.bootstrap-osd 
> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o 
> /var/lib/ceph/osd/ceph-0/activate.monmap
>  stderr: got monmap epoch 2
> Running command: ceph-authtool /var/lib/ceph/osd/ceph-0/keyring 
> --create-keyring --name osd.0 --add-key XXXXXXXX
>  stdout: creating /var/lib/ceph/osd/ceph-0/keyring
>  stdout: added entity osd.0 auth auth(auid = 18446744073709551615 key= 
> XXXXXXXX with 0 caps)
> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/keyring
> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-0/
> Running command: sudo ceph-osd --cluster ceph --osd-objectstore bluestore 
> --mkfs -i 0 --monmap /var/lib/ceph/osd/ceph-0/activate.monmap --key 
> **************************************** --osd-data /var/lib/ceph/osd/ceph-0/ 
> --osd-uuid 138ce507-f28a-45bf-814c-7fa124a9d9b9 --setuser ceph --setgroup ceph
>  stderr: 2018-01-26 14:59:10.039549 7fd7ef951cc0 -1 
> bluestore(/var/lib/ceph/osd/ceph-0//block) _read_bdev_label unable to decode 
> label at offset 102: buffer::malformed_input: void 
> bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end 
> of struct encoding
>  stderr: 2018-01-26 14:59:10.039744 7fd7ef951cc0 -1 
> bluestore(/var/lib/ceph/osd/ceph-0//block) _read_bdev_label unable to decode 
> label at offset 102: buffer::malformed_input: void 
> bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end 
> of struct encoding
>  stderr: 2018-01-26 14:59:10.039925 7fd7ef951cc0 -1 
> bluestore(/var/lib/ceph/osd/ceph-0//block) _read_bdev_label unable to decode 
> label at offset 102: buffer::malformed_input: void 
> bluestore_bdev_label_t::decode(ceph::buffer::list::iterator&) decode past end 
> of struct encoding
>  stderr: 2018-01-26 14:59:10.039984 7fd7ef951cc0 -1 
> bluestore(/var/lib/ceph/osd/ceph-0/) _read_fsid unparsable uuid
>  stderr: 2018-01-26 14:59:11.359951 7fd7ef951cc0 -1 key XXXXXXXX
>  stderr: 2018-01-26 14:59:11.888476 7fd7ef951cc0 -1 created object store 
> /var/lib/ceph/osd/ceph-0/ for osd.0 fsid efad7df8-721d-43d8-8d02-449406e70b90
> Running command: sudo ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev 
> /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9
>  --path /var/lib/ceph/osd/ceph-0
> Running command: sudo ln -snf 
> /dev/ceph-efad7df8-721d-43d8-8d02-449406e70b90/osd-block-138ce507-f28a-45bf-814c-7fa124a9d9b9
>  /var/lib/ceph/osd/ceph-0/block
> Running command: chown -R ceph:ceph /dev/dm-4
> Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
> Running command: sudo systemctl enable 
> ceph-volume@lvm-0-138ce507-f28a-45bf-814c-7fa124a9d9b9
>  stderr: Created symlink from 
> /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-0-138ce507-f28a-45bf-814c-7fa124a9d9b9.service
>  
> <mailto:etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-0-138ce507-f28a-45bf-814c-7fa124a9d9b9.service>
>  to /lib/systemd/system/ceph-volume@.service.
> Running command: sudo systemctl start ceph-osd@0
> 
> ceph-osd.0.log shows:
> 
> 2018-01-26 15:09:07.379039 7f545d3b9cc0  4 rocksdb: 
> [/build/ceph-12.2.2/src/rocksdb/db/version_set.cc:2859 
> <http://version_set.cc:2859/>] Recovered from manifest 
> file:db/MANIFEST-000095 succeeded,manifest_file_number is 95, 
> next_file_number is 97, last_sequence is 21, log_number is 0,prev_log_number 
> is 0,max_column_family is 0
> 
> 2018-01-26 15:09:07.379046 7f545d3b9cc0  4 rocksdb: 
> [/build/ceph-12.2.2/src/rocksdb/db/version_set.cc:2867 
> <http://version_set.cc:2867/>] Column family [default] (ID 0), log number is 
> 94
> 
> 2018-01-26 15:09:07.379087 7f545d3b9cc0  4 rocksdb: EVENT_LOG_v1 
> {"time_micros": 1516979347379083, "job": 1, "event": "recovery_started", 
> "log_files": [96]}
> 2018-01-26 15:09:07.379091 7f545d3b9cc0  4 rocksdb: 
> [/build/ceph-12.2.2/src/rocksdb/db/db_impl_open.cc:482 
> <http://db_impl_open.cc:482/>] Recovering log #96 mode 0
> 2018-01-26 15:09:07.379102 7f545d3b9cc0  4 rocksdb: 
> [/build/ceph-12.2.2/src/rocksdb/db/version_set.cc:2395 
> <http://version_set.cc:2395/>] Creating manifest 98
> 
> 2018-01-26 15:09:07.380466 7f545d3b9cc0  4 rocksdb: EVENT_LOG_v1 
> {"time_micros": 1516979347380463, "job": 1, "event": "recovery_finished"}
> 2018-01-26 15:09:07.381331 7f545d3b9cc0  4 rocksdb: 
> [/build/ceph-12.2.2/src/rocksdb/db/db_impl_open.cc:1063 
> <http://db_impl_open.cc:1063/>] DB pointer 0x556ecb8c3000
> 2018-01-26 15:09:07.381353 7f545d3b9cc0  1 
> bluestore(/var/lib/ceph/osd/ceph-0) _open_db opened rocksdb path db options 
> compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152
> 2018-01-26 15:09:07.381616 7f545d3b9cc0  1 freelist init
> 2018-01-26 15:09:07.381660 7f545d3b9cc0  1 
> bluestore(/var/lib/ceph/osd/ceph-0) _open_alloc opening allocation metadata
> 2018-01-26 15:09:07.381679 7f545d3b9cc0  1 
> bluestore(/var/lib/ceph/osd/ceph-0) _open_alloc loaded 447 G in 1 extents
> 2018-01-26 15:09:07.382077 7f545d3b9cc0  0 _get_class not permitted to load 
> kvs
> 2018-01-26 15:09:07.382309 7f545d3b9cc0  0 <cls> 
> /build/ceph-12.2.2/src/cls/cephfs/cls_cephfs.cc:197 
> <http://cls_cephfs.cc:197/>: loading cephfs
> 2018-01-26 15:09:07.382583 7f545d3b9cc0  0 _get_class not permitted to load 
> sdk
> 2018-01-26 15:09:07.382827 7f545d3b9cc0  0 <cls> 
> /build/ceph-12.2.2/src/cls/hello/cls_hello.cc:296 <http://cls_hello.cc:296/>: 
> loading cls_hello
> 2018-01-26 15:09:07.385755 7f545d3b9cc0  0 _get_class not permitted to load 
> lua
> 2018-01-26 15:09:07.386073 7f545d3b9cc0  0 osd.0 0 crush map has features 
> 288232575208783872, adjusting msgr requires for clients
> 2018-01-26 15:09:07.386078 7f545d3b9cc0  0 osd.0 0 crush map has features 
> 288232575208783872 was 8705, adjusting msgr requires for mons
> 2018-01-26 15:09:07.386079 7f545d3b9cc0  0 osd.0 0 crush map has features 
> 288232575208783872, adjusting msgr requires for osds
> 2018-01-26 15:09:07.386132 7f545d3b9cc0  0 osd.0 0 load_pgs
> 2018-01-26 15:09:07.386134 7f545d3b9cc0  0 osd.0 0 load_pgs opened 0 pgs
> 2018-01-26 15:09:07.386137 7f545d3b9cc0  0 osd.0 0 using weightedpriority op 
> queue with priority op cut off at 64.
> 2018-01-26 15:09:07.386580 7f545d3b9cc0 -1 osd.0 0 log_to_monitors 
> {default=true}
> 2018-01-26 15:09:07.388077 7f545d3b9cc0 -1 osd.0 0 init authentication 
> failed: (1) Operation not permitted
> 
> 
> The old osd is still there.
> 
> # ceph osd tree
> ID CLASS WEIGHT  TYPE NAME     STATUS    REWEIGHT PRI-AFF
> -1       2.60458 root default
> -2       0.86819     host int1
>  0   ssd 0.43159         osd.0 destroyed        0 1.00000
>  3   ssd 0.43660         osd.3        up  1.00000 1.00000
> -3       0.86819     host int2
>  1   ssd 0.43159         osd.1        up  1.00000 1.00000
>  4   ssd 0.43660         osd.4        up  1.00000 1.00000
> -4       0.86819     host int3
>  2   ssd 0.43159         osd.2        up  1.00000 1.00000
>  5   ssd 0.43660         osd.5        up  1.00000 1.00000
> 
> 
> What's the best course of action? Purging osd.0, zapping the device again and 
> creating without --osd-id set?
> 
> 
> Kind Regards,
> 
> David Majchrzak
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to