Re: [ceph-users] replace failed disk in Luminous v12.2.2

Dietmar Rieder Fri, 12 Jan 2018 07:09:52 -0800

Hi,

can someone, comment/confirm my planned OSD replacement procedure?


It would be very helpful for me.

Dietmar

Am 11. Januar 2018 17:47:50 MEZ schrieb Dietmar Rieder 
<dietmar.rie...@i-med.ac.at>:
>Hi Alfredo,
>
>thanks for your coments, see my answers inline.
>
>On 01/11/2018 01:47 PM, Alfredo Deza wrote:
>> On Thu, Jan 11, 2018 at 4:30 AM, Dietmar Rieder
>> <dietmar.rie...@i-med.ac.at> wrote:
>>> Hello,
>>>
>>> we have failed OSD disk in our Luminous v12.2.2 cluster that needs
>to
>>> get replaced.
>>>
>>> The cluster was initially deployed using ceph-deploy on Luminous
>>> v12.2.0. The OSDs were created using
>>>
>>> ceph-deploy osd create --bluestore cephosd-${osd}:/dev/sd${disk}
>>> --block-wal /dev/nvme0n1 --block-db /dev/nvme0n1
>>>
>>> Note we separated the bluestore data, wal and db.
>>>
>>> We updated to Luminous v12.2.1 and further to Luminous v12.2.2.
>>>
>>> With the last update we also let ceph-volume take over the OSDs
>using
>>> "ceph-volume simple scan  /var/lib/ceph/osd/$osd" and "ceph-volume
>>> simple activate ${osd} ${id}". All of this went smoothly.
>> 
>> That is good to hear!
>> 
>>>
>>> Now wonder what is the correct way to replace a failed OSD block
>disk?
>>>
>>> The docs for luminous [1] say:
>>>
>>> REPLACING AN OSD
>>>
>>> 1. Destroy the OSD first:
>>>
>>> ceph osd destroy {id} --yes-i-really-mean-it
>>>
>>> 2. Zap a disk for the new OSD, if the disk was used before for other
>>> purposes. It’s not necessary for a new disk:
>>>
>>> ceph-disk zap /dev/sdX
>>>
>>>
>>> 3. Prepare the disk for replacement by using the previously
>destroyed
>>> OSD id:
>>>
>>> ceph-disk prepare --bluestore /dev/sdX  --osd-id {id} --osd-uuid
>`uuidgen`
>>>
>>>
>>> 4. And activate the OSD:
>>>
>>> ceph-disk activate /dev/sdX1
>>>
>>>
>>> Initially this seems to be straight forward, but....
>>>
>>> 1. I'm not sure if there is something to do with the still existing
>>> bluefs db and wal partitions on the nvme device for the failed OSD.
>Do
>>> they have to be zapped ? If yes, what is the best way? There is
>nothing
>>> mentioned in the docs.
>> 
>> What is your concern here if the activation seems to work?
>
>I geuss on the nvme partitions for bluefs db and bluefs wal there is
>still data related to the failed OSD  block device. I was thinking that
>this data might "interfere" with the new replacement OSD block device,
>which is empty.
>
>So you are saying that this is no concern, right?
>Are they automatically reused and assigned to the replacement OSD block
>device, or do I have to specify them when running ceph-disk prepare?
>If I need to specify the wal and db partition, how is this done?
>
>I'm asking this since from the logs of the initial cluster deployment I
>got the following warning:
>
>[cephosd-02][WARNING] prepare_device: OSD will not be hot-swappable if
>block.db is not the same device as the osd data
>[...]
>[cephosd-02][WARNING] prepare_device: OSD will not be hot-swappable if
>block.wal is not the same device as the osd data
>
>
>>>
>>> 2. Since we already let "ceph-volume simple" take over our OSDs I'm
>not
>>> sure if we should now use ceph-volume or again ceph-disk (followed
>by
>>> "ceph-vloume simple" takeover) to prepare and activate the OSD?
>> 
>> The `simple` sub-command is meant to help with the activation of OSDs
>> at boot time, supporting ceph-disk (or manual) created OSDs.
>
>OK, got this...
>
>> 
>> There is no requirement to use `ceph-volume lvm` which is intended
>for
>> new OSDs using LVM as devices.
>
>Fine...
>
>>>
>>> 3. If we should use ceph-volume, then by looking at the luminous
>>> ceph-volume docs [2] I find for both,
>>>
>>> ceph-volume lvm prepare
>>> ceph-volume lvm activate
>>>
>>> that the bluestore option is either NOT implemented or NOT supported
>>>
>>> activate:  [–bluestore] filestore (IS THIS A TYPO???) objectstore
>(not
>>> yet implemented)
>>> prepare: [–bluestore] Use the bluestore objectstore (not currently
>>> supported)
>> 
>> These might be a typo on the man page, will get that addressed.
>Ticket
>> opened at http://tracker.ceph.com/issues/22663
>
>Thanks
>
>> bluestore as of 12.2.2 is fully supported and it is the default. The
>> --help output in ceph-volume does have the flags updated and
>correctly
>> showing this.
>
>OK
>
>>>
>>>
>>> So, now I'm completely lost. How is all of this fitting together in
>>> order to replace a failed OSD?
>> 
>> You would need to keep using ceph-disk. Unless you want ceph-volume
>to
>> take over, in which case you would need to follow the steps to deploy
>> a new OSD
>> with ceph-volume.
>
>OK
>
>> Note that although --osd-id is supported, there is an issue with that
>> on 12.2.2 that would prevent you from correctly deploying it
>> http://tracker.ceph.com/issues/22642
>> 
>> The recommendation, if you want to use ceph-volume, would be to omit
>> --osd-id and let the cluster give you the ID.
>> 
>>>
>>> 4. More.... after reading some a recent threads on this list
>additional
>>> questions are coming up:
>>>
>>> According to the OSD replacement doc [1] :
>>>
>>> "When disks fail, [...], OSDs need to be replaced. Unlike Removing
>the
>>> OSD, replaced OSD’s id and CRUSH map entry need to be keep [TYPO
>HERE?
>>> keep -> kept] intact after the OSD is destroyed for replacement."
>>>
>>> but
>>> http://tracker.ceph.com/issues/22642 seems to say that it is not
>>> possible to reuse am OSD's id
>> 
>> That is a ceph-volume specific issue, unrelated to how replacement in
>> Ceph works.
>
>OK
>
>>>
>>>
>>> So I'm quite lost with an essential and very basic seemingly simple
>task
>>> of storage management.
>> 
>> You have two choices:
>> 
>> 1) keep using ceph-disk as always, even though you have "ported" your
>> OSDs with `ceph-volume simple`
>> 2) Deploy new OSDs with ceph-volume
>> 
>> For #1 you will want to keep running `simple` on newly deployed OSDs
>> so that they can come up after a reboot, since `simple` disables the
>> udev rules
>> that caused activation with ceph-disk
>
>OK, thanks so much for clarifying these thinks. I'll go for the
>ceph-disk option then.
>
>Just to be sure, these would be the steps I would do:
>
>1.
>ceph osd destroy osd.33 --yes-i-really-mean-it
>
>2.
>remove the failed HDD and replace it with a new HDD
>
>3.
>ceph-disk prepare --bluestore /dev/sdo  --osd-id osd.33
>
>OR
>
>do I need to specify the wal and db partitions on the nvme here like
>Konstantin was suggesting in his answer to my question:
>
>3.1. Find nvme partition for this OSD using ceph-disk, which gives me:
>
>/dev/nvme1n1p2 ceph block.db
>/dev/nvme1n1p3 ceph block.wal
>
>3.2. Delete partition via parted or fdisk.
>
>fdisk -u /dev/nvme1n1
>d (delete partitions)
>enter partition number of block.db: 2
>d
>enter partition number of block.wal: 3
>w (write partition table)
>
>3.3. run ceph-disk prepare
>
>ceph-disk -v prepare --block.wal /dev/nvme1n1 --block.db /dev/nvme1n1 \
>--bluestore /dev/sdo --osd-id osd.33
>
>4.
>Do I need to run "ceph-disk activate"?
>
>ceph-disk activate /dev/sdo1
>
>or any of the "ceph-volume simple" commands now?
>
>or just start the osd with systemctl?
>
>Thanks so much, and sorry for my igonrance ;-)
>
>~Best
>   Dietmar
>
>-- 
>_________________________________________
>D i e t m a r  R i e d e r, Mag.Dr.
>Innsbruck Medical University
>Biocenter - Division for Bioinformatics
>Email: dietmar.rie...@i-med.ac.at
>Web:   http://www.icbi.at

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] replace failed disk in Luminous v12.2.2

Reply via email to