It seems that I have been able to workaround my issues.
I’ve attempted to reproduce by rebooting nodes and using the stop all OSDs wait 
a bit and start them.
At this time, no OSDs are crashing like before. OSDs seem to have no problems 
starting either.
What I did is remove completely the OSDs one at a time and reissue them 
allowing CEPH 14.2.1 to reengineer them.
Remove a disk:
1.) see which OSD is which disk: sudo ceph-volume lvm list

2.) ceph osd out X
synergy@synergy3:~$ ceph osd out 21
marked out osd.21.

2.a) ceph osd down osd.X
ceph osd down osd.21

2.aa) Stop OSD daemon: sudo systemctl stop ceph-osd@X
sudo systemctl stop ceph-osd@21

2.b) ceph osd rm osd.X
ceph osd rm osd.21

3.) check status : ceph -s

4.)Observe data migration: ceph -w

5.) remove from CRUSH: ceph osd crush remove {name}
EX: ceph osd crush remove osd.21
5.b) del auth: ceph auth del osd.21

6.) find info on disk:
sudo hdparm -I /dev/sdd

7.) see sata ports: lsscsi --verbose

8.) Go pull the disk and replace it, or not and do the following steps to 
re-use it.

additional steps to remove and reuse a disk: (without ejecting, as ejecting and 
replace drops this for us)
(do this last after following the CEPH docs for remove a disk.)
9.) sudo gdisk /dev/sdX (x,z,Y,Y)
 94  lsblk
 95  dmsetup remove 

 10.) deploy a /dev/sdX disk: from (ceph-mon0) you must be in 
the "my_cluster" folder:
EX: Synergy@Ceph-Mon0:~/my_cluster$ ceph-deploy osd create --data /dev/sdd 
 I have attached my doc I use to accomplish this. *BEfore I do it, I mark the 
OSD as “out” via the GUI or CLI and allow it to reweight to 0%, this is 
monitored via Ceph -s. I do this so that there is not an actual disk fail which 
then puts me into dual disk fail when I’m rebuilding an OSD.

-Edward Kalk

ceph-users mailing list

Reply via email to