Re: [ceph-users] Replacing a disk: Best practices?

2014-10-16 Thread Udo Lembke
Am 15.10.2014 22:08, schrieb Iban Cabrillo:
 HI Cephers,
 
  I have an other question related to this issue, What would be the
 procedure to restore a server fail (a whole server for example due to a
 mother board trouble with no damage on disk).
 
 Regards, I 
 
Hi,
- change serverboard.
- perhaps adapt /etc/udev/rules.d/70-persistent-net.rules (to get the
same devices (eth0/1...) for your network.
boot and wait for resync.

To avoid to much traffic I set noout if a whole server is lost.


Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replacing a disk: Best practices?

2014-10-16 Thread Iban Cabrillo
HI Udo,
  Thanks a lot! The resync flag have solved my doubts.

Regards, I

2014-10-16 12:21 GMT+02:00 Udo Lembke ulem...@polarzone.de:

 Am 15.10.2014 22:08, schrieb Iban Cabrillo:
  HI Cephers,
 
   I have an other question related to this issue, What would be the
  procedure to restore a server fail (a whole server for example due to a
  mother board trouble with no damage on disk).
 
  Regards, I
 
 Hi,
 - change serverboard.
 - perhaps adapt /etc/udev/rules.d/70-persistent-net.rules (to get the
 same devices (eth0/1...) for your network.
 boot and wait for resync.

 To avoid to much traffic I set noout if a whole server is lost.


 Udo
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Iban Cabrillo Bartolome
Instituto de Fisica de Cantabria (IFCA)
Santander, Spain
Tel: +34942200969
PGP PUBLIC KEY:
http://pgp.mit.edu/pks/lookup?op=getsearch=0xD9DF0B3D6C8C08AC

Bertrand Russell:
*El problema con el mundo es que los estúpidos están seguros de todo y los
inteligentes están llenos de dudas*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Replacing a disk: Best practices?

2014-10-15 Thread Bryan Wright
Hi folks,

I recently had an OSD disk die, and I'm wondering what are the
current best practices for replacing it.  I think I've thoroughly removed
the old disk, both physically and logically, but I'm having trouble figuring
out how to add the new disk into ceph.

For one thing, taking a look at this:

http://article.gmane.org/gmane.comp.file-systems.ceph.user/5285/match=osd+number

it sounds like I'll need to abandon my beautiful OSD numbering scheme.  Is
that right?

I've been looking around for instructions about replacing disks, and
came across this:

http://karan-mj.blogspot.com/2014/03/admin-guide-replacing-failed-disk-in.html?utm_source=feedburnerutm_medium=feedutm_campaign=Feed%3A+CephStorageNextBigThing+(Ceph+Storage+%3A%3A+Next+Big+Thing)

and this:

http://dachary.org/?p=2428

which sound very different from each other.

   What procedure do you recommend?

Thanks,
Bryan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replacing a disk: Best practices?

2014-10-15 Thread Daniel Schwager
Hi,

 I recently had an OSD disk die, and I'm wondering what are the
 current best practices for replacing it.  I think I've thoroughly removed
 the old disk, both physically and logically, but I'm having trouble figuring
 out how to add the new disk into ceph.

I did this today (one disk - osd.16 - died ;-):

   # @ceph-node3
/etc/init.d/ceph stop osd.16

# osd.16 loeschen
ceph osd crush remove osd.16
ceph auth del osd.16
ceph osd rm osd.16

# remove hdd, plugin new hdd
# /var/log/messages tells me
Oct 15 09:51:09 ceph-node3 kernel: [1489736.671840] sd 0:0:0:0: 
[sdd] Synchronizing SCSI cache
Oct 15 09:51:09 ceph-node3 kernel: [1489736.671873] sd 0:0:0:0: 
[sdd]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Oct 15 09:54:56 ceph-node3 kernel: [1489963.094744] sd 0:0:8:0: 
Attached scsi generic sg4 type 0
Oct 15 09:54:56 ceph-node3 kernel: [1489963.095235] sd 0:0:8:0: 
[sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB)
Oct 15 09:54:57 ceph-node3 kernel: [1489963.343664] sd 0:0:8:0: 
[sdd] Attached SCSI disk
-- /dev/sdd


# check /dev/sdd
root@ceph-node3:~#  smartctl -a /dev/sdd | less
=== START OF INFORMATION SECTION ===
Device Model: ST4000NM0033-9ZM170
Serial Number:Z1Z5LGBX
LU WWN Device Id: 5 000c50 079577e1a
Firmware Version: SN04
User Capacity:4.000.787.030.016 bytes [4,00 TB]
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE
  UPDATED  WHEN_FAILED RAW_VALUE
  4 Start_Stop_Count0x0032   100   100   020Old_age 
  Always   -   1
  5 Reallocated_Sector_Ct   0x0033   100   100   010
Pre-fail  Always   -   0
-- ok

# new /dev/sdd uses the absolute path:
/dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX

# create new  OSD  (with old journal partition)
admin@ceph-admin:~/cluster1$ ceph-deploy osd create 
ceph-node3:sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/admin/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.17): 
/usr/bin/ceph-deploy osd create 
ceph-node3:sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
ceph-node3:/dev/sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
...
[ceph_deploy.osd][DEBUG ] Host ceph-node3 is now ready for osd 
use.

# @ceph-admin modify config
admin@ceph-admin:~/cluster1$ ceph osd tree
...
admin@ceph-admin:~/cluster1$ emacs -nw ceph.conf
# osd16 was replaced

[osd.16]
...
devs = 
/dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1
...

# deploy config
ceph-deploy  --overwrite-conf config push ceph-mon{1,2,3} 
ceph-node{1,2,3} ceph-admin

# cluster-sync enablen
ceph osd unset noout

# check
ceph -w

regards
Danny


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replacing a disk: Best practices?

2014-10-15 Thread Loic Dachary
Hi Daniel,

On 15/10/2014 08:02, Daniel Schwager wrote:
 Hi,
 
 I recently had an OSD disk die, and I'm wondering what are the
 current best practices for replacing it.  I think I've thoroughly removed
 the old disk, both physically and logically, but I'm having trouble figuring
 out how to add the new disk into ceph.
 
 I did this today (one disk - osd.16 - died ;-):
 
# @ceph-node3
 /etc/init.d/ceph stop osd.16
 
 # osd.16 loeschen
 ceph osd crush remove osd.16
 ceph auth del osd.16
 ceph osd rm osd.16
 
 # remove hdd, plugin new hdd
 # /var/log/messages tells me
 Oct 15 09:51:09 ceph-node3 kernel: [1489736.671840] sd 
 0:0:0:0: [sdd] Synchronizing SCSI cache
 Oct 15 09:51:09 ceph-node3 kernel: [1489736.671873] sd 
 0:0:0:0: [sdd]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
 Oct 15 09:54:56 ceph-node3 kernel: [1489963.094744] sd 
 0:0:8:0: Attached scsi generic sg4 type 0
 Oct 15 09:54:56 ceph-node3 kernel: [1489963.095235] sd 
 0:0:8:0: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB)
 Oct 15 09:54:57 ceph-node3 kernel: [1489963.343664] sd 
 0:0:8:0: [sdd] Attached SCSI disk
 -- /dev/sdd
 
 
 # check /dev/sdd
 root@ceph-node3:~#  smartctl -a /dev/sdd | less
 === START OF INFORMATION SECTION ===
 Device Model: ST4000NM0033-9ZM170
 Serial Number:Z1Z5LGBX
 LU WWN Device Id: 5 000c50 079577e1a
 Firmware Version: SN04
 User Capacity:4.000.787.030.016 bytes [4,00 TB]
 ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  
 UPDATED  WHEN_FAILED RAW_VALUE
   4 Start_Stop_Count0x0032   100   100   020
 Old_age   Always   -   1
   5 Reallocated_Sector_Ct   0x0033   100   100   010
 Pre-fail  Always   -   0
 -- ok
 
 # new /dev/sdd uses the absolute path:
 /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX
 
 # create new  OSD  (with old journal partition)
 admin@ceph-admin:~/cluster1$ ceph-deploy osd create 
 ceph-node3:sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
 [ceph_deploy.conf][DEBUG ] found configuration file at: 
 /home/admin/.cephdeploy.conf
 [ceph_deploy.cli][INFO  ] Invoked (1.5.17): 
 /usr/bin/ceph-deploy osd create 
 ceph-node3:sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks 
 ceph-node3:/dev/sdd:/dev/disk/by-id/scsi-SATA_INTEL_SSDSC2BA1BTTV330609AU100FGN-part1
 ...
 [ceph_deploy.osd][DEBUG ] Host ceph-node3 is now ready for 
 osd use.
 
 # @ceph-admin modify config
 admin@ceph-admin:~/cluster1$ ceph osd tree
 ...
 admin@ceph-admin:~/cluster1$ emacs -nw ceph.conf
 # osd16 was replaced
 
 [osd.16]
 ...
 devs = 
 /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1


I'm curious about what this is used for.

Thanks a lot for sharing, very interesting read :-)

Cheers

 ...
 
 # deploy config
 ceph-deploy  --overwrite-conf config push ceph-mon{1,2,3} 
 ceph-node{1,2,3} ceph-admin
 
 # cluster-sync enablen
 ceph osd unset noout
 
 # check
 ceph -w
 
 regards
 Danny
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replacing a disk: Best practices?

2014-10-15 Thread Daniel Schwager
Loic,

  root@ceph-node3:~#  smartctl -a /dev/sdd | less
  === START OF INFORMATION SECTION ===
  Device Model: ST4000NM0033-9ZM170
  Serial Number:Z1Z5LGBX
 
.. 
  admin@ceph-admin:~/cluster1$ emacs -nw ceph.conf
  [osd.16]
  ...
  devs = 
  /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1
 
 
 I'm curious about what this is used for.

The normal device path /dev/sdd1 can change dependent on the amount/order of 
disks/controllers. So, using the scsi-path (containing the serial number)  is 
always unique:

root@ceph-node3:~# ls -altr /dev/sdd1
brw-rw---T 1 root disk 8, 49 Okt 15 10:06 /dev/sdd1

root@ceph-node3:~# ls -altr 
/dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1
lrwxrwxrwx 1 root root 10 Okt 15 10:06 
/dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1 - ../../sdd1

regards
Danny


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replacing a disk: Best practices?

2014-10-15 Thread Iban Cabrillo
HI Cephers,

 I have an other question related to this issue, What would be the
procedure to restore a server fail (a whole server for example due to a
mother board trouble with no damage on disk).

Regards, I

2014-10-15 20:22 GMT+02:00 Daniel Schwager daniel.schwa...@dtnet.de:

 Loic,

   root@ceph-node3:~#  smartctl -a /dev/sdd | less
   === START OF INFORMATION SECTION ===
   Device Model: ST4000NM0033-9ZM170
   Serial Number:Z1Z5LGBX
  
 ..
   admin@ceph-admin:~/cluster1$ emacs -nw ceph.conf
   [osd.16]
   ...
   devs =
 /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1
  
 
  I'm curious about what this is used for.

 The normal device path /dev/sdd1 can change dependent on the
 amount/order of disks/controllers. So, using the scsi-path (containing the
 serial number)  is always unique:

 root@ceph-node3:~# ls -altr /dev/sdd1
 brw-rw---T 1 root disk 8, 49 Okt 15 10:06 /dev/sdd1

 root@ceph-node3:~# ls -altr
 /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1
 lrwxrwxrwx 1 root root 10 Okt 15 10:06
 /dev/disk/by-id/scsi-SATA_ST4000NM0033-9Z_Z1Z5LGBX-part1 - ../../sdd1

 regards
 Danny

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 

Iban Cabrillo Bartolome
Instituto de Fisica de Cantabria (IFCA)
Santander, Spain
Tel: +34942200969
PGP PUBLIC KEY:
http://pgp.mit.edu/pks/lookup?op=getsearch=0xD9DF0B3D6C8C08AC

Bertrand Russell:
*El problema con el mundo es que los estúpidos están seguros de todo y los
inteligentes están llenos de dudas*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com