Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-12-11 Thread SCHAER Frederic
Hi,

Back on this.
I finally found out a logic in the mapping.

So after taking the time to note all the disks serial numbers on 3 different 
machines and 2 different OSes, I now know that my specific LSI SAS 2008 cards 
(no reference on them, but I think those are LSI sas 9207-8i) map the disks of 
the MD1000  in the reverse alphabetic order :

sd{b..p} map to slot{14..0}

There is absolutely nothing else that appears usable, except the sas_address of 
the disks which seems associated with slots. 
But even this one is different depending on machines, and the address <-> slot 
mapping does not seem very obvious at the very least...

Good thing is that I now know that fun tools exist in packages such as 
sg3_tils, smp_utils and others like mpt-status...
Next step is to try an md1200 ;)

Thanks again
Cheers

-Message d'origine-
De : JF Le Fillâtre [mailto:jean-francois.lefilla...@uni.lu] 
Envoyé : mercredi 19 novembre 2014 13:42
À : SCHAER Frederic
Cc : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] jbod + SMART : how to identify failing disks ?


Hello again,

So whatever magic allows the Dell MD1200 to report the slot position for
each disk isn't present in your JBODs. Time for something else.

There are two sides to your problem:

1) Identifying which disk is where in your JBOD

Quite easy. Again I'd go for a udev rule + script that will either
rename the disks entirely, or create a symlink with a name like
"jbodX-slotY" or something to figure out easily which is which. The
mapping end-device-to-slot can be static in the script, so you need to
identify once the order in which the kernel scans the slots and then you
can map.

But it won't survive a disk swap or a change of scanning order from a
kernel upgrade, so it's not enough.

2) Finding a way of identification independent of hot-plugs and scan order

That's the tricky part. If you remove a disk from your JBOD and replace
it with another one, the other one will get another "sdX" name, and in
my experience even another "end_device-..." name. But given that you
want the new disk to have the exact same name or symlink as the previous
one, you have to find something in the path of the device or (better) in
the udev attributes that is immutable.

If possible at all, it will depend on your specific hardware
combination, so you will have to try for yourself.

Suggested methodology:

1) write down the serial number of one drive in any slot, and figure out
its device name (sdX) with "smartctl -i /dev/sd..."

2) grab the detailed /sys path name and list of udev attributes:
readlink -f /sys/class/block/sdX
udevadm info --attribute-walk /dev/sdX

3) pull that disk and replace it. Check the logs to see which is its new
device name (sdY)

4) rerun the commands from #2 with sdY

5) compare the outputs and find something in the path or in the
attributes that didn't change and is unique to that disk (ie not a
common parent for example).

If you have something that really didn't change, you're in luck. Either
use the serial numbers or unplug and replug all disks one by one to
figure out the mapping slot number / immutable item.

Then write the udev rule. :)

Thanks!
JF



On 19/11/14 11:29, SCHAER Frederic wrote:
> Hi
> 
> Thanks.
> I hoped it would be it, but no ;)
> 
> With this mapping :
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdb -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:0/end_device-1:1:0/target1:0:1/1:0:1:0/block/sdb
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdc -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:1/end_device-1:1:1/target1:0:2/1:0:2:0/block/sdc
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdd -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:2/end_device-1:1:2/target1:0:3/1:0:3:0/block/sdd
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sde -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:3/end_device-1:1:3/target1:0:4/1:0:4:0/block/sde
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdf -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:4/end_device-1:1:4/target1:0:5/1:0:5:0/block/sdf
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdg -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:5/end_device-1:1:5/target1:0:6/1:0:6:0/block/sdg
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdh -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-19 Thread JF Le Fillâtre
 -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:3/end_device-1:2:3/target1:0:12/1:0:12:0/block/sdm
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdn -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:4/end_device-1:2:4/target1:0:13/1:0:13:0/block/sdn
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdo -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:5/end_device-1:2:5/target1:0:14/1:0:14:0/block/sdo
> lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdp -> 
> ../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:6/end_device-1:2:6/target1:0:15/1:0:15:0/block/sdp
> 
> sdd was on physical slot 12, sdk was on slot 5, and sdg was on slot 9 (and I 
> did not check the others)...
> so clearly this cannot be put in production as is and I'll have to find a way.
> 
> Regards
> 
> 
> -Message d'origine-
> De : Carl-Johan Schenström [mailto:carl-johan.schenst...@gu.se] 
> Envoyé : lundi 17 novembre 2014 14:14
> À : SCHAER Frederic; Scottix; Erik Logtenberg
> Cc : ceph-users@lists.ceph.com
> Objet : RE: [ceph-users] jbod + SMART : how to identify failing disks ?
> 
> Hi!
> 
> I'm fairly sure that the link targets in /sys/class/block were correct the 
> last time I had to change a drive on a system with a Dell HBA connected to an 
> MD1000, but perhaps I was just lucky. =/
> 
> I.e.,
> 
> # ls -l /sys/class/block/sdj
> lrwxrwxrwx. 1 root root 0 17 nov 13.54 /sys/class/block/sdj -> 
> ../../devices/pci:20/:20:0a.0/:21:00.0/host7/port-7:0/expander-7:0/port-7:0:1/expander-7:2/port-7:2:6/end_device-7:2:6/target7:0:7/7:0:7:0/block/sdj
> 
> would be first port on HBA, first expander, 7th slot (6, starting from 0). 
> Don't take my word for it, though!
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-19 Thread SCHAER Frederic
Hi

Thanks.
I hoped it would be it, but no ;)

With this mapping :
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdb -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:0/end_device-1:1:0/target1:0:1/1:0:1:0/block/sdb
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdc -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:1/end_device-1:1:1/target1:0:2/1:0:2:0/block/sdc
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdd -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:2/end_device-1:1:2/target1:0:3/1:0:3:0/block/sdd
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sde -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:3/end_device-1:1:3/target1:0:4/1:0:4:0/block/sde
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdf -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:4/end_device-1:1:4/target1:0:5/1:0:5:0/block/sdf
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdg -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:5/end_device-1:1:5/target1:0:6/1:0:6:0/block/sdg
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdh -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:6/end_device-1:1:6/target1:0:7/1:0:7:0/block/sdh
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdi -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:0/expander-1:1/port-1:1:7/end_device-1:1:7/target1:0:8/1:0:8:0/block/sdi
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdj -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:0/end_device-1:2:0/target1:0:9/1:0:9:0/block/sdj
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdk -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:1/end_device-1:2:1/target1:0:10/1:0:10:0/block/sdk
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdl -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:2/end_device-1:2:2/target1:0:11/1:0:11:0/block/sdl
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdm -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:3/end_device-1:2:3/target1:0:12/1:0:12:0/block/sdm
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdn -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:4/end_device-1:2:4/target1:0:13/1:0:13:0/block/sdn
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdo -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:5/end_device-1:2:5/target1:0:14/1:0:14:0/block/sdo
lrwxrwxrwx 1 root root 0 Nov 12 12:31 /sys/class/block/sdp -> 
../../devices/pci:00/:00:04.0/:0a:00.0/host1/port-1:0/expander-1:0/port-1:0:1/expander-1:2/port-1:2:6/end_device-1:2:6/target1:0:15/1:0:15:0/block/sdp

sdd was on physical slot 12, sdk was on slot 5, and sdg was on slot 9 (and I 
did not check the others)...
so clearly this cannot be put in production as is and I'll have to find a way.

Regards


-Message d'origine-
De : Carl-Johan Schenström [mailto:carl-johan.schenst...@gu.se] 
Envoyé : lundi 17 novembre 2014 14:14
À : SCHAER Frederic; Scottix; Erik Logtenberg
Cc : ceph-users@lists.ceph.com
Objet : RE: [ceph-users] jbod + SMART : how to identify failing disks ?

Hi!

I'm fairly sure that the link targets in /sys/class/block were correct the last 
time I had to change a drive on a system with a Dell HBA connected to an 
MD1000, but perhaps I was just lucky. =/

I.e.,

# ls -l /sys/class/block/sdj
lrwxrwxrwx. 1 root root 0 17 nov 13.54 /sys/class/block/sdj -> 
../../devices/pci:20/:20:0a.0/:21:00.0/host7/port-7:0/expander-7:0/port-7:0:1/expander-7:2/port-7:2:6/end_device-7:2:6/target7:0:7/7:0:7:0/block/sdj

would be first port on HBA, first expander, 7th slot (6, starting from 0). 
Don't take my word for it, though!

-- 
Carl-Johan Schenström
Driftansvarig / System Administrator
Språkbanken & Svensk nationell datatjänst /
The Swedish Language Bank & Swedish National Data Service
Göteborgs universitet / University of Gothenburg
carl-johan.schenst...@gu.se / +46 709 116769


From: ceph-users  on behalf of SCHAER 
Frederic 
Sent: Friday, November 14, 2014 17:24
To: Scottix; Erik Logtenberg
Cc: ceph-u

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-18 Thread SCHAER Frederic
Wow. Thanks
Not very operations friendly though…

Wouldn’t it be just OK to pull the disk that we think is the bad one, check the 
serial number, and if not, just replug and let the udev rules do their job and 
re-insert the disk in the ceph cluster ?
(provided XFS doesn’t freeze for good when we do that)

Regards

De : Craig Lewis [mailto:cle...@centraldesktop.com]
Envoyé : lundi 17 novembre 2014 22:32
À : SCHAER Frederic
Cc : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] jbod + SMART : how to identify failing disks ?

I use `dd` to force activity to the disk I want to replace, and watch the 
activity lights.  That only works if your disks aren't 100% busy.  If they are, 
stop the ceph-osd daemon, and see which drive stops having activity.  Repeat 
until you're 100% confident that you're pulling the right drive.

On Wed, Nov 12, 2014 at 5:05 AM, SCHAER Frederic 
mailto:frederic.sch...@cea.fr>> wrote:
Hi,

I’m used to RAID software giving me the failing disks  slots, and most often 
blinking the disks on the disk bays.
I recently installed a  DELL “6GB HBA SAS” JBOD card, said to be an LSI 2008 
one, and I now have to identify 3 pre-failed disks (so says S.M.A.R.T) .

Since this is an LSI, I thought I’d use MegaCli to identify the disks slot, but 
MegaCli does not see the HBA card.
Then I found the LSI “sas2ircu” utility, but again, this one fails at giving me 
the disk slots (it finds the disks, serials and others, but slot is always 0)
Because of this, I’m going to head over to the disk bay and unplug the disk 
which I think corresponds to the alphabetical order in linux, and see if it’s 
the correct one…. But even if this is correct this time, it might not be next 
time.

But this makes me wonder : how do you guys, Ceph users, manage your disks if 
you really have JBOD servers ?
I can’t imagine having to guess slots that each time, and I can’t imagine 
neither creating serial number stickers for every single disk I could have to 
manage …
Is there any specific advice reguarding JBOD cards people should (not) use in 
their systems ?
Any magical way to “blink” a drive in linux ?

Thanks && regards

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-17 Thread Christian Balzer

On Mon, 17 Nov 2014 13:31:57 -0800 Craig Lewis wrote:

> I use `dd` to force activity to the disk I want to replace, and watch the
> activity lights.  That only works if your disks aren't 100% busy.  If
> they are, stop the ceph-osd daemon, and see which drive stops having
> activity. Repeat until you're 100% confident that you're pulling the
> right drive.
>
I use smartctl for lighting up the disk, but same diff. 
JBOD can become a big PITA quickly with large deployments and if you don't
have people with sufficient skill doing disk replacements.

Also depending on how a disk died you might not be able to reclaim the
drive ID (sdc for example) without a reboot, making things even more
confusing. 

Some RAID cards in IT/JBOD mode _will_ actually light up the fail LED if
a disk fails and/or have tools to blink a specific disk. 
However with the later the task of matching a disk from the controller's
perspective to what linux enumerated it as is still on you.

Ceph might scale up to really large deployments, but you better have a
well staffed data center to come with that or deploy it in a non-JBOD
fashion. 

Christian

> On Wed, Nov 12, 2014 at 5:05 AM, SCHAER Frederic 
> wrote:
> 
> >  Hi,
> >
> >
> >
> > I’m used to RAID software giving me the failing disks  slots, and most
> > often blinking the disks on the disk bays.
> >
> > I recently installed a  DELL “6GB HBA SAS” JBOD card, said to be an LSI
> > 2008 one, and I now have to identify 3 pre-failed disks (so says
> > S.M.A.R.T) .
> >
> >
> >
> > Since this is an LSI, I thought I’d use MegaCli to identify the disks
> > slot, but MegaCli does not see the HBA card.
> >
> > Then I found the LSI “sas2ircu” utility, but again, this one fails at
> > giving me the disk slots (it finds the disks, serials and others, but
> > slot is always 0)
> >
> > Because of this, I’m going to head over to the disk bay and unplug the
> > disk which I think corresponds to the alphabetical order in linux, and
> > see if it’s the correct one…. But even if this is correct this time,
> > it might not be next time.
> >
> >
> >
> > But this makes me wonder : how do you guys, Ceph users, manage your
> > disks if you really have JBOD servers ?
> >
> > I can’t imagine having to guess slots that each time, and I can’t
> > imagine neither creating serial number stickers for every single disk
> > I could have to manage …
> >
> > Is there any specific advice reguarding JBOD cards people should (not)
> > use in their systems ?
> >
> > Any magical way to “blink” a drive in linux ?
> >
> >
> >
> > Thanks && regards
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >


-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-17 Thread Cedric Lemarchand
Sorry, I forgot to say that in "Slot X/device/block" you could find the
device name, like "sdc".

Cheers



Le 18/11/2014 00:15, Cedric Lemarchand a écrit :
> Hi,
>
> Try looking for file "locate" in a folder named "Slot X" where X in
> the number of the slot, then echoing 1 in the "locate" file will make
> the led blink. :
>
> # find /sys -name "locate"  |grep Slot
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 01/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 02/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 03/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 04/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 05/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 06/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 07/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 08/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 09/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 10/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 11/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 12/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 13/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 14/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 15/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 16/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 17/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 18/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 19/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 20/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 21/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 22/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 23/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 24/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 25/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 26/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
> 27/locate
> /sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/en

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-17 Thread Cedric Lemarchand
Hi,

Try looking for file "locate" in a folder named "Slot X" where X in the
number of the slot, then echoing 1 in the "locate" file will make the
led blink. :

# find /sys -name "locate"  |grep Slot
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
01/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
02/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
03/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
04/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
05/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
06/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
07/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
08/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
09/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
10/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
11/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
12/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
13/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
14/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
15/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
16/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
17/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
18/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
19/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
20/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
21/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
22/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
23/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
24/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
25/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
26/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
27/locate
/sys/devices/pci:00/:00:03.0/:06:00.0/host6/port-6:0/expander-6:0/port-6:0:28/end_device-6:0:28/target6:0:28/6:0:28:0/enclosure/6:0:28:0/Slot
28/locate

LSI 9200-8e with a Supermicro JBOD 28 slots, Ubuntu 12.04, 3.13 kernel.

Cheers


Le 12/11/2014 14:05, SCHAER Frederic a écrit :
>
> Hi,
>
>  
>
> I’m used to RAID software giving me the failing disks  slot

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-17 Thread Craig Lewis
I use `dd` to force activity to the disk I want to replace, and watch the
activity lights.  That only works if your disks aren't 100% busy.  If they
are, stop the ceph-osd daemon, and see which drive stops having activity.
Repeat until you're 100% confident that you're pulling the right drive.

On Wed, Nov 12, 2014 at 5:05 AM, SCHAER Frederic 
wrote:

>  Hi,
>
>
>
> I’m used to RAID software giving me the failing disks  slots, and most
> often blinking the disks on the disk bays.
>
> I recently installed a  DELL “6GB HBA SAS” JBOD card, said to be an LSI
> 2008 one, and I now have to identify 3 pre-failed disks (so says S.M.A.R.T)
> .
>
>
>
> Since this is an LSI, I thought I’d use MegaCli to identify the disks
> slot, but MegaCli does not see the HBA card.
>
> Then I found the LSI “sas2ircu” utility, but again, this one fails at
> giving me the disk slots (it finds the disks, serials and others, but slot
> is always 0)
>
> Because of this, I’m going to head over to the disk bay and unplug the
> disk which I think corresponds to the alphabetical order in linux, and see
> if it’s the correct one…. But even if this is correct this time, it might
> not be next time.
>
>
>
> But this makes me wonder : how do you guys, Ceph users, manage your disks
> if you really have JBOD servers ?
>
> I can’t imagine having to guess slots that each time, and I can’t imagine
> neither creating serial number stickers for every single disk I could have
> to manage …
>
> Is there any specific advice reguarding JBOD cards people should (not) use
> in their systems ?
>
> Any magical way to “blink” a drive in linux ?
>
>
>
> Thanks && regards
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-17 Thread Carl-Johan Schenström
Hi!

I'm fairly sure that the link targets in /sys/class/block were correct the last 
time I had to change a drive on a system with a Dell HBA connected to an 
MD1000, but perhaps I was just lucky. =/

I.e.,

# ls -l /sys/class/block/sdj
lrwxrwxrwx. 1 root root 0 17 nov 13.54 /sys/class/block/sdj -> 
../../devices/pci:20/:20:0a.0/:21:00.0/host7/port-7:0/expander-7:0/port-7:0:1/expander-7:2/port-7:2:6/end_device-7:2:6/target7:0:7/7:0:7:0/block/sdj

would be first port on HBA, first expander, 7th slot (6, starting from 0). 
Don't take my word for it, though!

-- 
Carl-Johan Schenström
Driftansvarig / System Administrator
Språkbanken & Svensk nationell datatjänst /
The Swedish Language Bank & Swedish National Data Service
Göteborgs universitet / University of Gothenburg
carl-johan.schenst...@gu.se / +46 709 116769


From: ceph-users  on behalf of SCHAER 
Frederic 
Sent: Friday, November 14, 2014 17:24
To: Scottix; Erik Logtenberg
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] jbod + SMART : how to identify failing disks ?

Hi,

Thanks for your replies :]
Indeed, I did not think about the /sys/class/leds, but unfortunately I have 
nothing in there on my systems.
This is kernel related, so I presume it would be the module duty to expose leds 
there (in my case, mpt2sas) ... that would indeed be welcome !

/sys/block is not of great help neither, unfortunately. The last thing I 
haven't tried is to compile the Dell driver and try it instead of the kernel 
one - sigh - , or an elrepo kernel...

[root@ceph0 ~]# cat 
/sys/block/sd*/../../../../sas_device/end_device-*/bay_identifier
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Maybe a kernel bug...

Regards

-Message d'origine-
De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Scottix
Envoyé : mercredi 12 novembre 2014 18:43
À : Erik Logtenberg
Cc : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] jbod + SMART : how to identify failing disks ?

I would say it depends on your system and where drives are connected
to. Some HBA have a cli tool to manage the drives connected like a
raid card would do.
One other method I found is sometimes it will expose the leds for you
http://fabiobaltieri.com/2011/09/21/linux-led-subsystem/ has an
article on the /sys/class/led but not guarantee.

On my laptop I could turn on lights and stuff but our server didn't
have anything. Seems like a feature either linux or smartctrl should
have. I have ran into this problem before but did a couple tricks to
figure it out.

I guess best solution is just to track the drives S/N. Maybe a good
note to have in the doc for a Ceph cluster to be aware of.

On Wed, Nov 12, 2014 at 9:06 AM, Erik Logtenberg  wrote:
> I have no experience with the DELL SAS controller, but usually the
> advantage of using a simple controller (instead of a RAID card) is that
> you can use full SMART directly.
>
> $ sudo smartctl -a /dev/sda
>
> === START OF INFORMATION SECTION ===
> Device Model: INTEL SSDSA2BW300G3H
> Serial Number:PEPR2381003E300EGN
>
> Personally, I make sure that I know which serial number drive is in
> which bay, so I can easily tell which drive I'm talking about.
>
> So you can use SMART both to notice (pre)failing disks -and- to
> physically identify them.
>
> The same smartctl command also returns the health status like so:
>
> 233 Media_Wearout_Indicator 0x0032   099   099   000Old_age   Always
>   -   0
>
> This specific SSD has 99% media lifetime left, so it's in the green. But
> it will continue to gradually degrade, and at some time It'll hit a
> percentage where I like to replace it. To keep an eye on the speed of
> decay, I'm graphing those SMART values in Cacti. That way I can somewhat
> predict how long a disk will last, especially SSD's which die very
> gradually.
>
> Erik.
>
>
> On 12-11-14 14:43, JF Le Fillâtre wrote:
>>
>> Hi,
>>
>> May or may not work depending on your JBOD and the way it's identified
>> and set up by the LSI card and the kernel:
>>
>> cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier
>>
>> The weird path and the wildcards are due to the way the sysfs is set up.
>>
>> That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running
>> CentOS release 6.5.
>>
>> Note that you can make your life easier by writing an udev script that
>> will create a symlink with a sane identifier for each of your external
>> disks. If you match along the lines of
>>
>> KERNEL=="sd*[a-z]", KERNELS=="end_device-*:*:*"
>>
>> then you'll just have to cat "/sys/class/sas_device/${1}/bay_identifier"
>> in a script (with $1 being the $id of udev after 

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-14 Thread SCHAER Frederic
Hi,

Thanks for your replies :]
Indeed, I did not think about the /sys/class/leds, but unfortunately I have 
nothing in there on my systems.
This is kernel related, so I presume it would be the module duty to expose leds 
there (in my case, mpt2sas) ... that would indeed be welcome !

/sys/block is not of great help neither, unfortunately. The last thing I 
haven't tried is to compile the Dell driver and try it instead of the kernel 
one - sigh - , or an elrepo kernel...

[root@ceph0 ~]# cat 
/sys/block/sd*/../../../../sas_device/end_device-*/bay_identifier
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Maybe a kernel bug...

Regards

-Message d'origine-
De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Scottix
Envoyé : mercredi 12 novembre 2014 18:43
À : Erik Logtenberg
Cc : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] jbod + SMART : how to identify failing disks ?

I would say it depends on your system and where drives are connected
to. Some HBA have a cli tool to manage the drives connected like a
raid card would do.
One other method I found is sometimes it will expose the leds for you
http://fabiobaltieri.com/2011/09/21/linux-led-subsystem/ has an
article on the /sys/class/led but not guarantee.

On my laptop I could turn on lights and stuff but our server didn't
have anything. Seems like a feature either linux or smartctrl should
have. I have ran into this problem before but did a couple tricks to
figure it out.

I guess best solution is just to track the drives S/N. Maybe a good
note to have in the doc for a Ceph cluster to be aware of.

On Wed, Nov 12, 2014 at 9:06 AM, Erik Logtenberg  wrote:
> I have no experience with the DELL SAS controller, but usually the
> advantage of using a simple controller (instead of a RAID card) is that
> you can use full SMART directly.
>
> $ sudo smartctl -a /dev/sda
>
> === START OF INFORMATION SECTION ===
> Device Model: INTEL SSDSA2BW300G3H
> Serial Number:PEPR2381003E300EGN
>
> Personally, I make sure that I know which serial number drive is in
> which bay, so I can easily tell which drive I'm talking about.
>
> So you can use SMART both to notice (pre)failing disks -and- to
> physically identify them.
>
> The same smartctl command also returns the health status like so:
>
> 233 Media_Wearout_Indicator 0x0032   099   099   000Old_age   Always
>   -   0
>
> This specific SSD has 99% media lifetime left, so it's in the green. But
> it will continue to gradually degrade, and at some time It'll hit a
> percentage where I like to replace it. To keep an eye on the speed of
> decay, I'm graphing those SMART values in Cacti. That way I can somewhat
> predict how long a disk will last, especially SSD's which die very
> gradually.
>
> Erik.
>
>
> On 12-11-14 14:43, JF Le Fillâtre wrote:
>>
>> Hi,
>>
>> May or may not work depending on your JBOD and the way it's identified
>> and set up by the LSI card and the kernel:
>>
>> cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier
>>
>> The weird path and the wildcards are due to the way the sysfs is set up.
>>
>> That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running
>> CentOS release 6.5.
>>
>> Note that you can make your life easier by writing an udev script that
>> will create a symlink with a sane identifier for each of your external
>> disks. If you match along the lines of
>>
>> KERNEL=="sd*[a-z]", KERNELS=="end_device-*:*:*"
>>
>> then you'll just have to cat "/sys/class/sas_device/${1}/bay_identifier"
>> in a script (with $1 being the $id of udev after that match, so the
>> string "end_device-X:Y:Z") to obtain the bay ID.
>>
>> Thanks,
>> JF
>>
>>
>>
>> On 12/11/14 14:05, SCHAER Frederic wrote:
>>> Hi,
>>>
>>>
>>>
>>> I’m used to RAID software giving me the failing disks  slots, and most
>>> often blinking the disks on the disk bays.
>>>
>>> I recently installed a  DELL “6GB HBA SAS” JBOD card, said to be an LSI
>>> 2008 one, and I now have to identify 3 pre-failed disks (so says
>>> S.M.A.R.T) .
>>>
>>>
>>>
>>> Since this is an LSI, I thought I’d use MegaCli to identify the disks
>>> slot, but MegaCli does not see the HBA card.
>>>
>>> Then I found the LSI “sas2ircu” utility, but again, this one fails at
>>> giving me the disk slots (it finds the disks, serials and others, but
>>> slot is always 0)
>>>
>>> Because of this, I’m going to head over to the disk bay and unplug the
>>> disk which I thin

Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-12 Thread Scottix
I would say it depends on your system and where drives are connected
to. Some HBA have a cli tool to manage the drives connected like a
raid card would do.
One other method I found is sometimes it will expose the leds for you
http://fabiobaltieri.com/2011/09/21/linux-led-subsystem/ has an
article on the /sys/class/led but not guarantee.

On my laptop I could turn on lights and stuff but our server didn't
have anything. Seems like a feature either linux or smartctrl should
have. I have ran into this problem before but did a couple tricks to
figure it out.

I guess best solution is just to track the drives S/N. Maybe a good
note to have in the doc for a Ceph cluster to be aware of.

On Wed, Nov 12, 2014 at 9:06 AM, Erik Logtenberg  wrote:
> I have no experience with the DELL SAS controller, but usually the
> advantage of using a simple controller (instead of a RAID card) is that
> you can use full SMART directly.
>
> $ sudo smartctl -a /dev/sda
>
> === START OF INFORMATION SECTION ===
> Device Model: INTEL SSDSA2BW300G3H
> Serial Number:PEPR2381003E300EGN
>
> Personally, I make sure that I know which serial number drive is in
> which bay, so I can easily tell which drive I'm talking about.
>
> So you can use SMART both to notice (pre)failing disks -and- to
> physically identify them.
>
> The same smartctl command also returns the health status like so:
>
> 233 Media_Wearout_Indicator 0x0032   099   099   000Old_age   Always
>   -   0
>
> This specific SSD has 99% media lifetime left, so it's in the green. But
> it will continue to gradually degrade, and at some time It'll hit a
> percentage where I like to replace it. To keep an eye on the speed of
> decay, I'm graphing those SMART values in Cacti. That way I can somewhat
> predict how long a disk will last, especially SSD's which die very
> gradually.
>
> Erik.
>
>
> On 12-11-14 14:43, JF Le Fillâtre wrote:
>>
>> Hi,
>>
>> May or may not work depending on your JBOD and the way it's identified
>> and set up by the LSI card and the kernel:
>>
>> cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier
>>
>> The weird path and the wildcards are due to the way the sysfs is set up.
>>
>> That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running
>> CentOS release 6.5.
>>
>> Note that you can make your life easier by writing an udev script that
>> will create a symlink with a sane identifier for each of your external
>> disks. If you match along the lines of
>>
>> KERNEL=="sd*[a-z]", KERNELS=="end_device-*:*:*"
>>
>> then you'll just have to cat "/sys/class/sas_device/${1}/bay_identifier"
>> in a script (with $1 being the $id of udev after that match, so the
>> string "end_device-X:Y:Z") to obtain the bay ID.
>>
>> Thanks,
>> JF
>>
>>
>>
>> On 12/11/14 14:05, SCHAER Frederic wrote:
>>> Hi,
>>>
>>>
>>>
>>> I’m used to RAID software giving me the failing disks  slots, and most
>>> often blinking the disks on the disk bays.
>>>
>>> I recently installed a  DELL “6GB HBA SAS” JBOD card, said to be an LSI
>>> 2008 one, and I now have to identify 3 pre-failed disks (so says
>>> S.M.A.R.T) .
>>>
>>>
>>>
>>> Since this is an LSI, I thought I’d use MegaCli to identify the disks
>>> slot, but MegaCli does not see the HBA card.
>>>
>>> Then I found the LSI “sas2ircu” utility, but again, this one fails at
>>> giving me the disk slots (it finds the disks, serials and others, but
>>> slot is always 0)
>>>
>>> Because of this, I’m going to head over to the disk bay and unplug the
>>> disk which I think corresponds to the alphabetical order in linux, and
>>> see if it’s the correct one…. But even if this is correct this time, it
>>> might not be next time.
>>>
>>>
>>>
>>> But this makes me wonder : how do you guys, Ceph users, manage your
>>> disks if you really have JBOD servers ?
>>>
>>> I can’t imagine having to guess slots that each time, and I can’t
>>> imagine neither creating serial number stickers for every single disk I
>>> could have to manage …
>>>
>>> Is there any specific advice reguarding JBOD cards people should (not)
>>> use in their systems ?
>>>
>>> Any magical way to “blink” a drive in linux ?
>>>
>>>
>>>
>>> Thanks && regards
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Follow Me: @Taijutsun
scot...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-12 Thread Erik Logtenberg
I have no experience with the DELL SAS controller, but usually the
advantage of using a simple controller (instead of a RAID card) is that
you can use full SMART directly.

$ sudo smartctl -a /dev/sda

=== START OF INFORMATION SECTION ===
Device Model: INTEL SSDSA2BW300G3H
Serial Number:PEPR2381003E300EGN

Personally, I make sure that I know which serial number drive is in
which bay, so I can easily tell which drive I'm talking about.

So you can use SMART both to notice (pre)failing disks -and- to
physically identify them.

The same smartctl command also returns the health status like so:

233 Media_Wearout_Indicator 0x0032   099   099   000Old_age   Always
  -   0

This specific SSD has 99% media lifetime left, so it's in the green. But
it will continue to gradually degrade, and at some time It'll hit a
percentage where I like to replace it. To keep an eye on the speed of
decay, I'm graphing those SMART values in Cacti. That way I can somewhat
predict how long a disk will last, especially SSD's which die very
gradually.

Erik.


On 12-11-14 14:43, JF Le Fillâtre wrote:
> 
> Hi,
> 
> May or may not work depending on your JBOD and the way it's identified
> and set up by the LSI card and the kernel:
> 
> cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier
> 
> The weird path and the wildcards are due to the way the sysfs is set up.
> 
> That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running
> CentOS release 6.5.
> 
> Note that you can make your life easier by writing an udev script that
> will create a symlink with a sane identifier for each of your external
> disks. If you match along the lines of
> 
> KERNEL=="sd*[a-z]", KERNELS=="end_device-*:*:*"
> 
> then you'll just have to cat "/sys/class/sas_device/${1}/bay_identifier"
> in a script (with $1 being the $id of udev after that match, so the
> string "end_device-X:Y:Z") to obtain the bay ID.
> 
> Thanks,
> JF
> 
> 
> 
> On 12/11/14 14:05, SCHAER Frederic wrote:
>> Hi,
>>
>>  
>>
>> I’m used to RAID software giving me the failing disks  slots, and most
>> often blinking the disks on the disk bays.
>>
>> I recently installed a  DELL “6GB HBA SAS” JBOD card, said to be an LSI
>> 2008 one, and I now have to identify 3 pre-failed disks (so says
>> S.M.A.R.T) .
>>
>>  
>>
>> Since this is an LSI, I thought I’d use MegaCli to identify the disks
>> slot, but MegaCli does not see the HBA card.
>>
>> Then I found the LSI “sas2ircu” utility, but again, this one fails at
>> giving me the disk slots (it finds the disks, serials and others, but
>> slot is always 0)
>>
>> Because of this, I’m going to head over to the disk bay and unplug the
>> disk which I think corresponds to the alphabetical order in linux, and
>> see if it’s the correct one…. But even if this is correct this time, it
>> might not be next time.
>>
>>  
>>
>> But this makes me wonder : how do you guys, Ceph users, manage your
>> disks if you really have JBOD servers ?
>>
>> I can’t imagine having to guess slots that each time, and I can’t
>> imagine neither creating serial number stickers for every single disk I
>> could have to manage …
>>
>> Is there any specific advice reguarding JBOD cards people should (not)
>> use in their systems ?
>>
>> Any magical way to “blink” a drive in linux ?
>>
>>  
>>
>> Thanks && regards
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-12 Thread JF Le Fillâtre

Hi,

May or may not work depending on your JBOD and the way it's identified
and set up by the LSI card and the kernel:

cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier

The weird path and the wildcards are due to the way the sysfs is set up.

That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running
CentOS release 6.5.

Note that you can make your life easier by writing an udev script that
will create a symlink with a sane identifier for each of your external
disks. If you match along the lines of

KERNEL=="sd*[a-z]", KERNELS=="end_device-*:*:*"

then you'll just have to cat "/sys/class/sas_device/${1}/bay_identifier"
in a script (with $1 being the $id of udev after that match, so the
string "end_device-X:Y:Z") to obtain the bay ID.

Thanks,
JF



On 12/11/14 14:05, SCHAER Frederic wrote:
> Hi,
> 
>  
> 
> I’m used to RAID software giving me the failing disks  slots, and most
> often blinking the disks on the disk bays.
> 
> I recently installed a  DELL “6GB HBA SAS” JBOD card, said to be an LSI
> 2008 one, and I now have to identify 3 pre-failed disks (so says
> S.M.A.R.T) .
> 
>  
> 
> Since this is an LSI, I thought I’d use MegaCli to identify the disks
> slot, but MegaCli does not see the HBA card.
> 
> Then I found the LSI “sas2ircu” utility, but again, this one fails at
> giving me the disk slots (it finds the disks, serials and others, but
> slot is always 0)
> 
> Because of this, I’m going to head over to the disk bay and unplug the
> disk which I think corresponds to the alphabetical order in linux, and
> see if it’s the correct one…. But even if this is correct this time, it
> might not be next time.
> 
>  
> 
> But this makes me wonder : how do you guys, Ceph users, manage your
> disks if you really have JBOD servers ?
> 
> I can’t imagine having to guess slots that each time, and I can’t
> imagine neither creating serial number stickers for every single disk I
> could have to manage …
> 
> Is there any specific advice reguarding JBOD cards people should (not)
> use in their systems ?
> 
> Any magical way to “blink” a drive in linux ?
> 
>  
> 
> Thanks && regards
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com