Re: [ceph-users] jbod + SMART : how to identify failing disks ?
Hi, Back on this. I finally found out a logic in the mapping. So after taking the time to note all the disks serial numbers on 3 different machines and 2 different OSes, I now know that my specific LSI SAS 2008 cards (no reference on them, but I think those are LSI sas 9207-8i) map the disks of the MD1000 in the reverse alphabetic order : sd{b..p} map to slot{14..0} There is absolutely nothing else that appears usable, except the sas_address of the disks which seems associated with slots. But even this one is different depending on machines, and the address <-> slot mapping does not seem very obvious at the very least... Good thing is that I now know that fun tools exist in packages such as sg3_tils, smp_utils and others like mpt-status... Next step is to try an md1200 ;) Thanks again Cheers -Message d'origine- De : JF Le Fillâtre [mailto:jean-francois.lefilla...@uni.lu] Envoyé : mercredi 19 novembre 2014 13:42 À : SCHAER Frederic Cc : ceph-users@lists.ceph.com Objet : Re: [ceph-users] jbod + SMART : how to identify failing disks ? Hello again, So whatever magic allows the Dell MD1200 to report the slot position for each disk isn't present in your JBODs. Time for something else. There are two sides to your problem: 1) Identifying which disk is where in your JBOD Quite easy. Again I'd go for a udev rule + script that will either rename the disks entirely, or create a symlink with a name like "jbodX-slotY" or something to figure out easily which is which. The mapping end-device-to-slot can be static in the script, so you need to identify once the order in which the kernel scans the slots and then you can map. But it won't survive a disk swap or a change of scanning order from a kernel upgrade, so it's not enough. 2) Finding a way of identification independent of hot-plugs and scan order That's the tricky part. If you remove a disk from your JBOD and replace it with another one, the other one will get another "sdX" name, and in my experience even another "end_device-..." name. But given that you want the new disk to have the exact same name or symlink as the previous one, you have to find something in the path of the device or (better) in the udev attributes that is immutable. If possible at all, it will depend on your specific hardware combination, so you will have to try for yourself. Suggested methodology: 1) write down the serial number of one drive in any slot, and figure out its device name (sdX) with "smartctl -i /dev/sd..." 2) grab the detailed /sys path name and list of udev attributes: readlink -f /sys/class/block/sdX udevadm info --attribute-walk /dev/sdX 3) pull that disk and replace it. Check the logs to see which is its new device name (sdY) 4) rerun the commands from #2 with sdY 5) compare the outputs and find something in the path or in the attributes that didn't change and is unique to that disk (ie not a common parent for example). If you have something that really didn't change, you're in luck. Either use the serial numbers or unplug and replug all disks one by one to figure out the mapping slot number / immutable item. Then write the udev rule. :) Thanks! JF On 19/11/14 11:29, SCHAER Frederic wrote: > Hi > > Thanks. > I hoped it would be it, but no ;) > > With this mapping : > lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdb -> > ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:0/end_device-1:1:0/target1:0:1/1:0:1:0/block/sdb > lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdc -> > ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:1/end_device-1:1:1/target1:0:2/1:0:2:0/block/sdc > lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdd -> > ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:2/end_device-1:1:2/target1:0:3/1:0:3:0/block/sdd > lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sde -> > ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:3/end_device-1:1:3/target1:0:4/1:0:4:0/block/sde > lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdf -> > ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:4/end_device-1:1:4/target1:0:5/1:0:5:0/block/sdf > lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdg -> > ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:5/end_device-1:1:5/target1:0:6/1:0:6:0/block/sdg > lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdh -> > ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:
Re: [ceph-users] jbod + SMART : how to identify failing disks ?
-> > ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:3/end_device-1:2:3/target1:0:12/1:0:12:0/block/sdm > lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdn -> > ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:4/end_device-1:2:4/target1:0:13/1:0:13:0/block/sdn > lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdo -> > ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:5/end_device-1:2:5/target1:0:14/1:0:14:0/block/sdo > lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdp -> > ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:6/end_device-1:2:6/target1:0:15/1:0:15:0/block/sdp > > sdd was on physical slot 12, sdk was on slot 5, and sdg was on slot 9 (and I > did not check the others)... > so clearly this cannot be put in production as is and I'll have to find a way. > > Regards > > > -Message d'origine- > De : Carl-Johan Schenström [mailto:carl-johan.schenst...@gu.se] > Envoyé : lundi 17 novembre 2014 14:14 > À : SCHAER Frederic; Scottix; Erik Logtenberg > Cc : ceph-users@lists.ceph.com > Objet : RE: [ceph-users] jbod + SMART : how to identify failing disks ? > > Hi! > > I'm fairly sure that the link targets in /sys/class/block were correct the > last time I had to change a drive on a system with a Dell HBA connected to an > MD1000, but perhaps I was just lucky. =/ > > I.e., > > # ls -l /sys/class/block/sdj > lrwxrwxrwx. 1 root root 0 17 nov 13.54 /sys/class/block/sdj -> > ../../devices/pci:20/:20:0a.0/:21:00.0/host7/port-7:0/expander-7:0/port-7:0:1/expander-7:2/port-7:2:6/end_device-7:2:6/target7:0:7/7:0:7:0/block/sdj > > would be first port on HBA, first expander, 7th slot (6, starting from 0). > Don't take my word for it, though! > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] jbod + SMART : how to identify failing disks ?
Hi Thanks. I hoped it would be it, but no ;) With this mapping : lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdb -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:0/end_device-1:1:0/target1:0:1/1:0:1:0/block/sdb lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdc -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:1/end_device-1:1:1/target1:0:2/1:0:2:0/block/sdc lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdd -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:2/end_device-1:1:2/target1:0:3/1:0:3:0/block/sdd lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sde -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:3/end_device-1:1:3/target1:0:4/1:0:4:0/block/sde lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdf -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:4/end_device-1:1:4/target1:0:5/1:0:5:0/block/sdf lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdg -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:5/end_device-1:1:5/target1:0:6/1:0:6:0/block/sdg lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdh -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:6/end_device-1:1:6/target1:0:7/1:0:7:0/block/sdh lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdi -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:7/end_device-1:1:7/target1:0:8/1:0:8:0/block/sdi lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdj -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:0/end_device-1:2:0/target1:0:9/1:0:9:0/block/sdj lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdk -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:1/end_device-1:2:1/target1:0:10/1:0:10:0/block/sdk lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdl -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:2/end_device-1:2:2/target1:0:11/1:0:11:0/block/sdl lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdm -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:3/end_device-1:2:3/target1:0:12/1:0:12:0/block/sdm lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdn -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:4/end_device-1:2:4/target1:0:13/1:0:13:0/block/sdn lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdo -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:5/end_device-1:2:5/target1:0:14/1:0:14:0/block/sdo lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdp -> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:6/end_device-1:2:6/target1:0:15/1:0:15:0/block/sdp sdd was on physical slot 12, sdk was on slot 5, and sdg was on slot 9 (and I did not check the others)... so clearly this cannot be put in production as is and I'll have to find a way. Regards -Message d'origine- De : Carl-Johan Schenström [mailto:carl-johan.schenst...@gu.se] Envoyé : lundi 17 novembre 2014 14:14 À : SCHAER Frederic; Scottix; Erik Logtenberg Cc : ceph-users@lists.ceph.com Objet : RE: [ceph-users] jbod + SMART : how to identify failing disks ? Hi! I'm fairly sure that the link targets in /sys/class/block were correct the last time I had to change a drive on a system with a Dell HBA connected to an MD1000, but perhaps I was just lucky. =/ I.e., # ls -l /sys/class/block/sdj lrwxrwxrwx. 1 root root 0 17 nov 13.54 /sys/class/block/sdj -> ../../devices/pci:20/:20:0a.0/:21:00.0/host7/port-7:0/expander-7:0/port-7:0:1/expander-7:2/port-7:2:6/end_device-7:2:6/target7:0:7/7:0:7:0/block/sdj would be first port on HBA, first expander, 7th slot (6, starting from 0). Don't take my word for it, though! -- Carl-Johan Schenström Driftansvarig / System Administrator Språkbanken & Svensk nationell datatjänst / The Swedish Language Bank & Swedish National Data Service Göteborgs universitet / University of Gothenburg carl-johan.schenst...@gu.se / +46 709 116769 From: ceph-users on behalf of SCHAER Frederic Sent: Friday, November 14, 2014 17:24 To: Scottix; Erik Logtenberg Cc: ceph-u
Re: [ceph-users] jbod + SMART : how to identify failing disks ?
Wow. Thanks Not very operations friendly though… Wouldn’t it be just OK to pull the disk that we think is the bad one, check the serial number, and if not, just replug and let the udev rules do their job and re-insert the disk in the ceph cluster ? (provided XFS doesn’t freeze for good when we do that) Regards De : Craig Lewis [mailto:cle...@centraldesktop.com] Envoyé : lundi 17 novembre 2014 22:32 À : SCHAER Frederic Cc : ceph-users@lists.ceph.com Objet : Re: [ceph-users] jbod + SMART : how to identify failing disks ? I use `dd` to force activity to the disk I want to replace, and watch the activity lights. That only works if your disks aren't 100% busy. If they are, stop the ceph-osd daemon, and see which drive stops having activity. Repeat until you're 100% confident that you're pulling the right drive. On Wed, Nov 12, 2014 at 5:05 AM, SCHAER Frederic mailto:frederic.sch...@cea.fr>> wrote: Hi, I’m used to RAID software giving me the failing disks slots, and most often blinking the disks on the disk bays. I recently installed a DELL “6GB HBA SAS” JBOD card, said to be an LSI 2008 one, and I now have to identify 3 pre-failed disks (so says S.M.A.R.T) . Since this is an LSI, I thought I’d use MegaCli to identify the disks slot, but MegaCli does not see the HBA card. Then I found the LSI “sas2ircu” utility, but again, this one fails at giving me the disk slots (it finds the disks, serials and others, but slot is always 0) Because of this, I’m going to head over to the disk bay and unplug the disk which I think corresponds to the alphabetical order in linux, and see if it’s the correct one…. But even if this is correct this time, it might not be next time. But this makes me wonder : how do you guys, Ceph users, manage your disks if you really have JBOD servers ? I can’t imagine having to guess slots that each time, and I can’t imagine neither creating serial number stickers for every single disk I could have to manage … Is there any specific advice reguarding JBOD cards people should (not) use in their systems ? Any magical way to “blink” a drive in linux ? Thanks && regards ___ ceph-users mailing list ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] jbod + SMART : how to identify failing disks ?
On Mon, 17 Nov 2014 13:31:57 -0800 Craig Lewis wrote: > I use `dd` to force activity to the disk I want to replace, and watch the > activity lights. That only works if your disks aren't 100% busy. If > they are, stop the ceph-osd daemon, and see which drive stops having > activity. Repeat until you're 100% confident that you're pulling the > right drive. > I use smartctl for lighting up the disk, but same diff. JBOD can become a big PITA quickly with large deployments and if you don't have people with sufficient skill doing disk replacements. Also depending on how a disk died you might not be able to reclaim the drive ID (sdc for example) without a reboot, making things even more confusing. Some RAID cards in IT/JBOD mode _will_ actually light up the fail LED if a disk fails and/or have tools to blink a specific disk. However with the later the task of matching a disk from the controller's perspective to what linux enumerated it as is still on you. Ceph might scale up to really large deployments, but you better have a well staffed data center to come with that or deploy it in a non-JBOD fashion. Christian > On Wed, Nov 12, 2014 at 5:05 AM, SCHAER Frederic > wrote: > > > Hi, > > > > > > > > I’m used to RAID software giving me the failing disks slots, and most > > often blinking the disks on the disk bays. > > > > I recently installed a DELL “6GB HBA SAS” JBOD card, said to be an LSI > > 2008 one, and I now have to identify 3 pre-failed disks (so says > > S.M.A.R.T) . > > > > > > > > Since this is an LSI, I thought I’d use MegaCli to identify the disks > > slot, but MegaCli does not see the HBA card. > > > > Then I found the LSI “sas2ircu” utility, but again, this one fails at > > giving me the disk slots (it finds the disks, serials and others, but > > slot is always 0) > > > > Because of this, I’m going to head over to the disk bay and unplug the > > disk which I think corresponds to the alphabetical order in linux, and > > see if it’s the correct one…. But even if this is correct this time, > > it might not be next time. > > > > > > > > But this makes me wonder : how do you guys, Ceph users, manage your > > disks if you really have JBOD servers ? > > > > I can’t imagine having to guess slots that each time, and I can’t > > imagine neither creating serial number stickers for every single disk > > I could have to manage … > > > > Is there any specific advice reguarding JBOD cards people should (not) > > use in their systems ? > > > > Any magical way to “blink” a drive in linux ? > > > > > > > > Thanks && regards > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] jbod + SMART : how to identify failing disks ?
Sorry, I forgot to say that in "Slot X/device/block" you could find the device name, like "sdc". Cheers Le 18/11/2014 00:15, Cedric Lemarchand a écrit : > Hi, > > Try looking for file "locate" in a folder named "Slot X" where X in > the number of the slot, then echoing 1 in the "locate" file will make > the led blink. : > > # find /sys -name "locate" |grep Slot > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 01/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 02/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 03/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 04/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 05/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 06/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 07/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 08/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 09/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 10/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 11/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 12/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 13/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 14/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 15/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 16/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 17/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 18/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 19/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 20/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 21/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 22/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 23/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 24/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 25/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 26/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot > 27/locate > /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/en
Re: [ceph-users] jbod + SMART : how to identify failing disks ?
Hi, Try looking for file "locate" in a folder named "Slot X" where X in the number of the slot, then echoing 1 in the "locate" file will make the led blink. : # find /sys -name "locate" |grep Slot /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 01/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 02/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 03/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 04/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 05/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 06/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 07/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 08/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 09/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 10/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 11/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 12/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 13/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 14/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 15/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 16/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 17/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 18/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 19/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 20/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 21/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 22/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 23/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 24/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 25/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 26/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 27/locate /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot 28/locate LSI 9200-8e with a Supermicro JBOD 28 slots, Ubuntu 12.04, 3.13 kernel. Cheers Le 12/11/2014 14:05, SCHAER Frederic a écrit : > > Hi, > > > > I’m used to RAID software giving me the failing disks slot
Re: [ceph-users] jbod + SMART : how to identify failing disks ?
I use `dd` to force activity to the disk I want to replace, and watch the activity lights. That only works if your disks aren't 100% busy. If they are, stop the ceph-osd daemon, and see which drive stops having activity. Repeat until you're 100% confident that you're pulling the right drive. On Wed, Nov 12, 2014 at 5:05 AM, SCHAER Frederic wrote: > Hi, > > > > I’m used to RAID software giving me the failing disks slots, and most > often blinking the disks on the disk bays. > > I recently installed a DELL “6GB HBA SAS” JBOD card, said to be an LSI > 2008 one, and I now have to identify 3 pre-failed disks (so says S.M.A.R.T) > . > > > > Since this is an LSI, I thought I’d use MegaCli to identify the disks > slot, but MegaCli does not see the HBA card. > > Then I found the LSI “sas2ircu” utility, but again, this one fails at > giving me the disk slots (it finds the disks, serials and others, but slot > is always 0) > > Because of this, I’m going to head over to the disk bay and unplug the > disk which I think corresponds to the alphabetical order in linux, and see > if it’s the correct one…. But even if this is correct this time, it might > not be next time. > > > > But this makes me wonder : how do you guys, Ceph users, manage your disks > if you really have JBOD servers ? > > I can’t imagine having to guess slots that each time, and I can’t imagine > neither creating serial number stickers for every single disk I could have > to manage … > > Is there any specific advice reguarding JBOD cards people should (not) use > in their systems ? > > Any magical way to “blink” a drive in linux ? > > > > Thanks && regards > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] jbod + SMART : how to identify failing disks ?
Hi! I'm fairly sure that the link targets in /sys/class/block were correct the last time I had to change a drive on a system with a Dell HBA connected to an MD1000, but perhaps I was just lucky. =/ I.e., # ls -l /sys/class/block/sdj lrwxrwxrwx. 1 root root 0 17 nov 13.54 /sys/class/block/sdj -> ../../devices/pci:20/:20:0a.0/:21:00.0/host7/port-7:0/expander-7:0/port-7:0:1/expander-7:2/port-7:2:6/end_device-7:2:6/target7:0:7/7:0:7:0/block/sdj would be first port on HBA, first expander, 7th slot (6, starting from 0). Don't take my word for it, though! -- Carl-Johan Schenström Driftansvarig / System Administrator Språkbanken & Svensk nationell datatjänst / The Swedish Language Bank & Swedish National Data Service Göteborgs universitet / University of Gothenburg carl-johan.schenst...@gu.se / +46 709 116769 From: ceph-users on behalf of SCHAER Frederic Sent: Friday, November 14, 2014 17:24 To: Scottix; Erik Logtenberg Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] jbod + SMART : how to identify failing disks ? Hi, Thanks for your replies :] Indeed, I did not think about the /sys/class/leds, but unfortunately I have nothing in there on my systems. This is kernel related, so I presume it would be the module duty to expose leds there (in my case, mpt2sas) ... that would indeed be welcome ! /sys/block is not of great help neither, unfortunately. The last thing I haven't tried is to compile the Dell driver and try it instead of the kernel one - sigh - , or an elrepo kernel... [root@ceph0 ~]# cat /sys/block/sd*/../../../../sas_device/end_device-*/bay_identifier 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Maybe a kernel bug... Regards -Message d'origine- De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Scottix Envoyé : mercredi 12 novembre 2014 18:43 À : Erik Logtenberg Cc : ceph-users@lists.ceph.com Objet : Re: [ceph-users] jbod + SMART : how to identify failing disks ? I would say it depends on your system and where drives are connected to. Some HBA have a cli tool to manage the drives connected like a raid card would do. One other method I found is sometimes it will expose the leds for you http://fabiobaltieri.com/2011/09/21/linux-led-subsystem/ has an article on the /sys/class/led but not guarantee. On my laptop I could turn on lights and stuff but our server didn't have anything. Seems like a feature either linux or smartctrl should have. I have ran into this problem before but did a couple tricks to figure it out. I guess best solution is just to track the drives S/N. Maybe a good note to have in the doc for a Ceph cluster to be aware of. On Wed, Nov 12, 2014 at 9:06 AM, Erik Logtenberg wrote: > I have no experience with the DELL SAS controller, but usually the > advantage of using a simple controller (instead of a RAID card) is that > you can use full SMART directly. > > $ sudo smartctl -a /dev/sda > > === START OF INFORMATION SECTION === > Device Model: INTEL SSDSA2BW300G3H > Serial Number:PEPR2381003E300EGN > > Personally, I make sure that I know which serial number drive is in > which bay, so I can easily tell which drive I'm talking about. > > So you can use SMART both to notice (pre)failing disks -and- to > physically identify them. > > The same smartctl command also returns the health status like so: > > 233 Media_Wearout_Indicator 0x0032 099 099 000Old_age Always > - 0 > > This specific SSD has 99% media lifetime left, so it's in the green. But > it will continue to gradually degrade, and at some time It'll hit a > percentage where I like to replace it. To keep an eye on the speed of > decay, I'm graphing those SMART values in Cacti. That way I can somewhat > predict how long a disk will last, especially SSD's which die very > gradually. > > Erik. > > > On 12-11-14 14:43, JF Le Fillâtre wrote: >> >> Hi, >> >> May or may not work depending on your JBOD and the way it's identified >> and set up by the LSI card and the kernel: >> >> cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier >> >> The weird path and the wildcards are due to the way the sysfs is set up. >> >> That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running >> CentOS release 6.5. >> >> Note that you can make your life easier by writing an udev script that >> will create a symlink with a sane identifier for each of your external >> disks. If you match along the lines of >> >> KERNEL=="sd*[a-z]", KERNELS=="end_device-*:*:*" >> >> then you'll just have to cat "/sys/class/sas_device/${1}/bay_identifier" >> in a script (with $1 being the $id of udev after
Re: [ceph-users] jbod + SMART : how to identify failing disks ?
Hi, Thanks for your replies :] Indeed, I did not think about the /sys/class/leds, but unfortunately I have nothing in there on my systems. This is kernel related, so I presume it would be the module duty to expose leds there (in my case, mpt2sas) ... that would indeed be welcome ! /sys/block is not of great help neither, unfortunately. The last thing I haven't tried is to compile the Dell driver and try it instead of the kernel one - sigh - , or an elrepo kernel... [root@ceph0 ~]# cat /sys/block/sd*/../../../../sas_device/end_device-*/bay_identifier 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Maybe a kernel bug... Regards -Message d'origine- De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Scottix Envoyé : mercredi 12 novembre 2014 18:43 À : Erik Logtenberg Cc : ceph-users@lists.ceph.com Objet : Re: [ceph-users] jbod + SMART : how to identify failing disks ? I would say it depends on your system and where drives are connected to. Some HBA have a cli tool to manage the drives connected like a raid card would do. One other method I found is sometimes it will expose the leds for you http://fabiobaltieri.com/2011/09/21/linux-led-subsystem/ has an article on the /sys/class/led but not guarantee. On my laptop I could turn on lights and stuff but our server didn't have anything. Seems like a feature either linux or smartctrl should have. I have ran into this problem before but did a couple tricks to figure it out. I guess best solution is just to track the drives S/N. Maybe a good note to have in the doc for a Ceph cluster to be aware of. On Wed, Nov 12, 2014 at 9:06 AM, Erik Logtenberg wrote: > I have no experience with the DELL SAS controller, but usually the > advantage of using a simple controller (instead of a RAID card) is that > you can use full SMART directly. > > $ sudo smartctl -a /dev/sda > > === START OF INFORMATION SECTION === > Device Model: INTEL SSDSA2BW300G3H > Serial Number:PEPR2381003E300EGN > > Personally, I make sure that I know which serial number drive is in > which bay, so I can easily tell which drive I'm talking about. > > So you can use SMART both to notice (pre)failing disks -and- to > physically identify them. > > The same smartctl command also returns the health status like so: > > 233 Media_Wearout_Indicator 0x0032 099 099 000Old_age Always > - 0 > > This specific SSD has 99% media lifetime left, so it's in the green. But > it will continue to gradually degrade, and at some time It'll hit a > percentage where I like to replace it. To keep an eye on the speed of > decay, I'm graphing those SMART values in Cacti. That way I can somewhat > predict how long a disk will last, especially SSD's which die very > gradually. > > Erik. > > > On 12-11-14 14:43, JF Le Fillâtre wrote: >> >> Hi, >> >> May or may not work depending on your JBOD and the way it's identified >> and set up by the LSI card and the kernel: >> >> cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier >> >> The weird path and the wildcards are due to the way the sysfs is set up. >> >> That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running >> CentOS release 6.5. >> >> Note that you can make your life easier by writing an udev script that >> will create a symlink with a sane identifier for each of your external >> disks. If you match along the lines of >> >> KERNEL=="sd*[a-z]", KERNELS=="end_device-*:*:*" >> >> then you'll just have to cat "/sys/class/sas_device/${1}/bay_identifier" >> in a script (with $1 being the $id of udev after that match, so the >> string "end_device-X:Y:Z") to obtain the bay ID. >> >> Thanks, >> JF >> >> >> >> On 12/11/14 14:05, SCHAER Frederic wrote: >>> Hi, >>> >>> >>> >>> I’m used to RAID software giving me the failing disks slots, and most >>> often blinking the disks on the disk bays. >>> >>> I recently installed a DELL “6GB HBA SAS” JBOD card, said to be an LSI >>> 2008 one, and I now have to identify 3 pre-failed disks (so says >>> S.M.A.R.T) . >>> >>> >>> >>> Since this is an LSI, I thought I’d use MegaCli to identify the disks >>> slot, but MegaCli does not see the HBA card. >>> >>> Then I found the LSI “sas2ircu” utility, but again, this one fails at >>> giving me the disk slots (it finds the disks, serials and others, but >>> slot is always 0) >>> >>> Because of this, I’m going to head over to the disk bay and unplug the >>> disk which I thin
Re: [ceph-users] jbod + SMART : how to identify failing disks ?
I would say it depends on your system and where drives are connected to. Some HBA have a cli tool to manage the drives connected like a raid card would do. One other method I found is sometimes it will expose the leds for you http://fabiobaltieri.com/2011/09/21/linux-led-subsystem/ has an article on the /sys/class/led but not guarantee. On my laptop I could turn on lights and stuff but our server didn't have anything. Seems like a feature either linux or smartctrl should have. I have ran into this problem before but did a couple tricks to figure it out. I guess best solution is just to track the drives S/N. Maybe a good note to have in the doc for a Ceph cluster to be aware of. On Wed, Nov 12, 2014 at 9:06 AM, Erik Logtenberg wrote: > I have no experience with the DELL SAS controller, but usually the > advantage of using a simple controller (instead of a RAID card) is that > you can use full SMART directly. > > $ sudo smartctl -a /dev/sda > > === START OF INFORMATION SECTION === > Device Model: INTEL SSDSA2BW300G3H > Serial Number:PEPR2381003E300EGN > > Personally, I make sure that I know which serial number drive is in > which bay, so I can easily tell which drive I'm talking about. > > So you can use SMART both to notice (pre)failing disks -and- to > physically identify them. > > The same smartctl command also returns the health status like so: > > 233 Media_Wearout_Indicator 0x0032 099 099 000Old_age Always > - 0 > > This specific SSD has 99% media lifetime left, so it's in the green. But > it will continue to gradually degrade, and at some time It'll hit a > percentage where I like to replace it. To keep an eye on the speed of > decay, I'm graphing those SMART values in Cacti. That way I can somewhat > predict how long a disk will last, especially SSD's which die very > gradually. > > Erik. > > > On 12-11-14 14:43, JF Le Fillâtre wrote: >> >> Hi, >> >> May or may not work depending on your JBOD and the way it's identified >> and set up by the LSI card and the kernel: >> >> cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier >> >> The weird path and the wildcards are due to the way the sysfs is set up. >> >> That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running >> CentOS release 6.5. >> >> Note that you can make your life easier by writing an udev script that >> will create a symlink with a sane identifier for each of your external >> disks. If you match along the lines of >> >> KERNEL=="sd*[a-z]", KERNELS=="end_device-*:*:*" >> >> then you'll just have to cat "/sys/class/sas_device/${1}/bay_identifier" >> in a script (with $1 being the $id of udev after that match, so the >> string "end_device-X:Y:Z") to obtain the bay ID. >> >> Thanks, >> JF >> >> >> >> On 12/11/14 14:05, SCHAER Frederic wrote: >>> Hi, >>> >>> >>> >>> I’m used to RAID software giving me the failing disks slots, and most >>> often blinking the disks on the disk bays. >>> >>> I recently installed a DELL “6GB HBA SAS” JBOD card, said to be an LSI >>> 2008 one, and I now have to identify 3 pre-failed disks (so says >>> S.M.A.R.T) . >>> >>> >>> >>> Since this is an LSI, I thought I’d use MegaCli to identify the disks >>> slot, but MegaCli does not see the HBA card. >>> >>> Then I found the LSI “sas2ircu” utility, but again, this one fails at >>> giving me the disk slots (it finds the disks, serials and others, but >>> slot is always 0) >>> >>> Because of this, I’m going to head over to the disk bay and unplug the >>> disk which I think corresponds to the alphabetical order in linux, and >>> see if it’s the correct one…. But even if this is correct this time, it >>> might not be next time. >>> >>> >>> >>> But this makes me wonder : how do you guys, Ceph users, manage your >>> disks if you really have JBOD servers ? >>> >>> I can’t imagine having to guess slots that each time, and I can’t >>> imagine neither creating serial number stickers for every single disk I >>> could have to manage … >>> >>> Is there any specific advice reguarding JBOD cards people should (not) >>> use in their systems ? >>> >>> Any magical way to “blink” a drive in linux ? >>> >>> >>> >>> Thanks && regards >>> >>> >>> >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Follow Me: @Taijutsun scot...@gmail.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] jbod + SMART : how to identify failing disks ?
I have no experience with the DELL SAS controller, but usually the advantage of using a simple controller (instead of a RAID card) is that you can use full SMART directly. $ sudo smartctl -a /dev/sda === START OF INFORMATION SECTION === Device Model: INTEL SSDSA2BW300G3H Serial Number:PEPR2381003E300EGN Personally, I make sure that I know which serial number drive is in which bay, so I can easily tell which drive I'm talking about. So you can use SMART both to notice (pre)failing disks -and- to physically identify them. The same smartctl command also returns the health status like so: 233 Media_Wearout_Indicator 0x0032 099 099 000Old_age Always - 0 This specific SSD has 99% media lifetime left, so it's in the green. But it will continue to gradually degrade, and at some time It'll hit a percentage where I like to replace it. To keep an eye on the speed of decay, I'm graphing those SMART values in Cacti. That way I can somewhat predict how long a disk will last, especially SSD's which die very gradually. Erik. On 12-11-14 14:43, JF Le Fillâtre wrote: > > Hi, > > May or may not work depending on your JBOD and the way it's identified > and set up by the LSI card and the kernel: > > cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier > > The weird path and the wildcards are due to the way the sysfs is set up. > > That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running > CentOS release 6.5. > > Note that you can make your life easier by writing an udev script that > will create a symlink with a sane identifier for each of your external > disks. If you match along the lines of > > KERNEL=="sd*[a-z]", KERNELS=="end_device-*:*:*" > > then you'll just have to cat "/sys/class/sas_device/${1}/bay_identifier" > in a script (with $1 being the $id of udev after that match, so the > string "end_device-X:Y:Z") to obtain the bay ID. > > Thanks, > JF > > > > On 12/11/14 14:05, SCHAER Frederic wrote: >> Hi, >> >> >> >> I’m used to RAID software giving me the failing disks slots, and most >> often blinking the disks on the disk bays. >> >> I recently installed a DELL “6GB HBA SAS” JBOD card, said to be an LSI >> 2008 one, and I now have to identify 3 pre-failed disks (so says >> S.M.A.R.T) . >> >> >> >> Since this is an LSI, I thought I’d use MegaCli to identify the disks >> slot, but MegaCli does not see the HBA card. >> >> Then I found the LSI “sas2ircu” utility, but again, this one fails at >> giving me the disk slots (it finds the disks, serials and others, but >> slot is always 0) >> >> Because of this, I’m going to head over to the disk bay and unplug the >> disk which I think corresponds to the alphabetical order in linux, and >> see if it’s the correct one…. But even if this is correct this time, it >> might not be next time. >> >> >> >> But this makes me wonder : how do you guys, Ceph users, manage your >> disks if you really have JBOD servers ? >> >> I can’t imagine having to guess slots that each time, and I can’t >> imagine neither creating serial number stickers for every single disk I >> could have to manage … >> >> Is there any specific advice reguarding JBOD cards people should (not) >> use in their systems ? >> >> Any magical way to “blink” a drive in linux ? >> >> >> >> Thanks && regards >> >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] jbod + SMART : how to identify failing disks ?
Hi, May or may not work depending on your JBOD and the way it's identified and set up by the LSI card and the kernel: cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier The weird path and the wildcards are due to the way the sysfs is set up. That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running CentOS release 6.5. Note that you can make your life easier by writing an udev script that will create a symlink with a sane identifier for each of your external disks. If you match along the lines of KERNEL=="sd*[a-z]", KERNELS=="end_device-*:*:*" then you'll just have to cat "/sys/class/sas_device/${1}/bay_identifier" in a script (with $1 being the $id of udev after that match, so the string "end_device-X:Y:Z") to obtain the bay ID. Thanks, JF On 12/11/14 14:05, SCHAER Frederic wrote: > Hi, > > > > I’m used to RAID software giving me the failing disks slots, and most > often blinking the disks on the disk bays. > > I recently installed a DELL “6GB HBA SAS” JBOD card, said to be an LSI > 2008 one, and I now have to identify 3 pre-failed disks (so says > S.M.A.R.T) . > > > > Since this is an LSI, I thought I’d use MegaCli to identify the disks > slot, but MegaCli does not see the HBA card. > > Then I found the LSI “sas2ircu” utility, but again, this one fails at > giving me the disk slots (it finds the disks, serials and others, but > slot is always 0) > > Because of this, I’m going to head over to the disk bay and unplug the > disk which I think corresponds to the alphabetical order in linux, and > see if it’s the correct one…. But even if this is correct this time, it > might not be next time. > > > > But this makes me wonder : how do you guys, Ceph users, manage your > disks if you really have JBOD servers ? > > I can’t imagine having to guess slots that each time, and I can’t > imagine neither creating serial number stickers for every single disk I > could have to manage … > > Is there any specific advice reguarding JBOD cards people should (not) > use in their systems ? > > Any magical way to “blink” a drive in linux ? > > > > Thanks && regards > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com