Re: [ceph-users] Replacing a disk: Best practices?
Am 15.10.2014 22:08, schrieb Iban Cabrillo: HI Cephers, I have an other question related to this issue, What would be the procedure to restore a server fail (a whole server for example due to a mother board trouble with no damage on disk). Regards, I Hi, - change serverboard. - perhaps adapt /etc/udev/rules.d/70-persistent-net.rules (to get the same devices (eth0/1...) for your network. boot and wait for resync. To avoid to much traffic I set noout if a whole server is lost. Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Replacing a disk: Best practices?
HI Udo, Thanks a lot! The resync flag have solved my doubts. Regards, I 2014-10-16 12:21 GMT+02:00 Udo Lembke ulem...@polarzone.de: Am 15.10.2014 22:08, schrieb Iban Cabrillo: HI Cephers, I have an other question related to this issue, What would be the procedure to restore a server fail (a whole server for example due to a mother board trouble with no damage on disk). Regards, I Hi, - change serverboard. - perhaps adapt /etc/udev/rules.d/70-persistent-net.rules (to get the same devices (eth0/1...) for your network. boot and wait for resync. To avoid to much traffic I set noout if a whole server is lost. Udo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Iban Cabrillo Bartolome Instituto de Fisica de Cantabria (IFCA) Santander, Spain Tel: +34942200969 PGP PUBLIC KEY: http://pgp.mit.edu/pks/lookup?op=getsearch=0xD9DF0B3D6C8C08AC Bertrand Russell: *El problema con el mundo es que los estúpidos están seguros de todo y los inteligentes están llenos de dudas* ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Replacing a disk: Best practices?
Hi folks, I recently had an OSD disk die, and I'm wondering what are the current best practices for replacing it. I think I've thoroughly removed the old disk, both physically and logically, but I'm having trouble figuring out how to add the new disk into ceph. For one thing, taking a look at this: http://article.gmane.org/gmane.comp.file-systems.ceph.user/5285/match=osd+number it sounds like I'll need to abandon my beautiful OSD numbering scheme. Is that right? I've been looking around for instructions about replacing disks, and came across this: http://karan-mj.blogspot.com/2014/03/admin-guide-replacing-failed-disk-in.html?utm_source=feedburnerutm_medium=feedutm_campaign=Feed%3A+CephStorageNextBigThing+(Ceph+Storage+%3A%3A+Next+Big+Thing) and this: http://dachary.org/?p=2428 which sound very different from each other. What procedure do you recommend? Thanks, Bryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Replacing a disk: Best practices?
Hi, I recently had an OSD disk die, and I'm wondering what are the current best practices for replacing it. I think I've thoroughly removed the old disk, both physically and logically, but I'm having trouble figuring out how to add the new disk into ceph. I did this today (one disk - osd.16 - died ;-): # @ceph-node3 /etc/init.d/ceph stop osd.16 # osd.16 loeschen ceph osd crush remove osd.16 ceph auth del osd.16 ceph osd rm osd.16 # remove hdd, plugin new hdd # /var/log/messages tells me Oct 15 09:51:09 ceph-node3 kernel: [1489736.671840] sd 0:0:0:0: [sdd] Synchronizing SCSI cache Oct 15 09:51:09 ceph-node3 kernel: [1489736.671873] sd 0:0:0:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Oct 15 09:54:56 ceph-node3 kernel: [1489963.094744] sd 0:0:8:0: Attached scsi generic sg4 type 0 Oct 15 09:54:56 ceph-node3 kernel: [1489963.095235] sd 0:0:8:0: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB) Oct 15 09:54:57 ceph-node3 kernel: [1489963.343664] sd 0:0:8:0: [sdd] Attached SCSI disk -- /dev/sdd # check /dev/sdd root@ceph-node3:~# smartctl -a /dev/sdd | less === START OF INFORMATION SECTION === Device Model: ST4000NM0033-9ZM170 Serial Number:Z1Z5LGBX LU WWN Device Id: 5 000c50 079577e1a Firmware Version: SN04 User Capacity:4.000.787.030.016 bytes [4,00 TB] ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 4 Start_Stop_Count0x0032 100 100 020Old_age Always - 1 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 -- ok # new /dev/sdd uses the absolute path: /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX # create new OSD (with old journal partition) admin@ceph-admin:~/cluster1$ ceph-deploy osd create ceph-node3:sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1 [ceph_deploy.conf][DEBUG ] found configuration file at: /home/admin/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.17): /usr/bin/ceph-deploy osd create ceph-node3:sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph-node3:/dev/sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1 ... [ceph_deploy.osd][DEBUG ] Host ceph-node3 is now ready for osd use. # @ceph-admin modify config admin@ceph-admin:~/cluster1$ ceph osd tree ... admin@ceph-admin:~/cluster1$ emacs -nw ceph.conf # osd16 was replaced [osd.16] ... devs = /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1 ... # deploy config ceph-deploy --overwrite-conf config push ceph-mon{1,2,3} ceph-node{1,2,3} ceph-admin # cluster-sync enablen ceph osd unset noout # check ceph -w regards Danny smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Replacing a disk: Best practices?
Hi Daniel, On 15/10/2014 08:02, Daniel Schwager wrote: Hi, I recently had an OSD disk die, and I'm wondering what are the current best practices for replacing it. I think I've thoroughly removed the old disk, both physically and logically, but I'm having trouble figuring out how to add the new disk into ceph. I did this today (one disk - osd.16 - died ;-): # @ceph-node3 /etc/init.d/ceph stop osd.16 # osd.16 loeschen ceph osd crush remove osd.16 ceph auth del osd.16 ceph osd rm osd.16 # remove hdd, plugin new hdd # /var/log/messages tells me Oct 15 09:51:09 ceph-node3 kernel: [1489736.671840] sd 0:0:0:0: [sdd] Synchronizing SCSI cache Oct 15 09:51:09 ceph-node3 kernel: [1489736.671873] sd 0:0:0:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Oct 15 09:54:56 ceph-node3 kernel: [1489963.094744] sd 0:0:8:0: Attached scsi generic sg4 type 0 Oct 15 09:54:56 ceph-node3 kernel: [1489963.095235] sd 0:0:8:0: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB) Oct 15 09:54:57 ceph-node3 kernel: [1489963.343664] sd 0:0:8:0: [sdd] Attached SCSI disk -- /dev/sdd # check /dev/sdd root@ceph-node3:~# smartctl -a /dev/sdd | less === START OF INFORMATION SECTION === Device Model: ST4000NM0033-9ZM170 Serial Number:Z1Z5LGBX LU WWN Device Id: 5 000c50 079577e1a Firmware Version: SN04 User Capacity:4.000.787.030.016 bytes [4,00 TB] ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 4 Start_Stop_Count0x0032 100 100 020 Old_age Always - 1 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 -- ok # new /dev/sdd uses the absolute path: /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX # create new OSD (with old journal partition) admin@ceph-admin:~/cluster1$ ceph-deploy osd create ceph-node3:sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1 [ceph_deploy.conf][DEBUG ] found configuration file at: /home/admin/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.17): /usr/bin/ceph-deploy osd create ceph-node3:sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph-node3:/dev/sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1 ... [ceph_deploy.osd][DEBUG ] Host ceph-node3 is now ready for osd use. # @ceph-admin modify config admin@ceph-admin:~/cluster1$ ceph osd tree ... admin@ceph-admin:~/cluster1$ emacs -nw ceph.conf # osd16 was replaced [osd.16] ... devs = /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1 I'm curious about what this is used for. Thanks a lot for sharing, very interesting read :-) Cheers ... # deploy config ceph-deploy --overwrite-conf config push ceph-mon{1,2,3} ceph-node{1,2,3} ceph-admin # cluster-sync enablen ceph osd unset noout # check ceph -w regards Danny ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Replacing a disk: Best practices?
Loic, root@ceph-node3:~# smartctl -a /dev/sdd | less === START OF INFORMATION SECTION === Device Model: ST4000NM0033-9ZM170 Serial Number:Z1Z5LGBX .. admin@ceph-admin:~/cluster1$ emacs -nw ceph.conf [osd.16] ... devs = /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1 I'm curious about what this is used for. The normal device path /dev/sdd1 can change dependent on the amount/order of disks/controllers. So, using the scsi-path (containing the serial number) is always unique: root@ceph-node3:~# ls -altr /dev/sdd1 brw-rw---T 1 root disk 8, 49 Okt 15 10:06 /dev/sdd1 root@ceph-node3:~# ls -altr /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1 lrwxrwxrwx 1 root root 10 Okt 15 10:06 /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1 - ../../sdd1 regards Danny smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Replacing a disk: Best practices?
HI Cephers, I have an other question related to this issue, What would be the procedure to restore a server fail (a whole server for example due to a mother board trouble with no damage on disk). Regards, I 2014-10-15 20:22 GMT+02:00 Daniel Schwager daniel.schwa...@dtnet.de: Loic, root@ceph-node3:~# smartctl -a /dev/sdd | less === START OF INFORMATION SECTION === Device Model: ST4000NM0033-9ZM170 Serial Number:Z1Z5LGBX .. admin@ceph-admin:~/cluster1$ emacs -nw ceph.conf [osd.16] ... devs = /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1 I'm curious about what this is used for. The normal device path /dev/sdd1 can change dependent on the amount/order of disks/controllers. So, using the scsi-path (containing the serial number) is always unique: root@ceph-node3:~# ls -altr /dev/sdd1 brw-rw---T 1 root disk 8, 49 Okt 15 10:06 /dev/sdd1 root@ceph-node3:~# ls -altr /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1 lrwxrwxrwx 1 root root 10 Okt 15 10:06 /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1 - ../../sdd1 regards Danny ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Iban Cabrillo Bartolome Instituto de Fisica de Cantabria (IFCA) Santander, Spain Tel: +34942200969 PGP PUBLIC KEY: http://pgp.mit.edu/pks/lookup?op=getsearch=0xD9DF0B3D6C8C08AC Bertrand Russell: *El problema con el mundo es que los estúpidos están seguros de todo y los inteligentes están llenos de dudas* ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com