comment below...

> On Jan 21, 2015, at 4:17 PM, Jorgen Lundman via illumos-discuss 
> <[email protected]> wrote:
> 
>> Yeah it looks like i spoke too soon...I just realized you said the SSDs are 
>> actually failing not just going offline and appearing to fail.
> 
> Yeah, these servers run great, until a disk dies :)
> 
>  9:08am  up 1098 day(s), 20:49,  1 user,  load average: 0.29, 0.28, 0.25
>  9:08am  up 1017 day(s), 15:07,  0 users,  load average: 0.57, 0.52, 0.46
> 
> 
> 
>> 
>> In extreme cases, a bad drive can cause POST to fail.
>> 
> 
> Yes, had something like that 2-3 years ago. But the last three just took
> out the system, technically console worked but any command causing IO would
> hang, including reboot. Powercycle was the way out, POST takes a little
> longer looking for the dead device, but eventually carries on.
> 
> 
>> NB, also in those cases, no permanent damage was done and a zpool scrub 
>> showed
>> no data loss :-)
> 
> No, not lost any data, although, since a dying disk forces a reboot there
> is some customer outage. If it happens to be a mail server, dovecot can
> leave lock files around (of course, letting NFSv4 client 'do its thing'
> generally ends up correct, even if it takes 10-15 mins to sort itself out)
> 
> 
>> But do have sata expanders in the system ?  Those are known to be toxic. 
> 
> I believe the LSI sas2008 is sas all the way, but the SSDs are straight
> SATA Intel 360 (IIRC).
> 
> If mpt_sas was never released, are there controllers with software that was
> released? These are generic Supermicro ~20 to ~40 disk servers, and the LSi
> card is added separately (for JBOD). So it wouldn't be impossible to change
> controller. But if it's more of a general problem with SATA then it
> wouldn't matter.
> 
> Even thought I have failed to replicate the failure case, I will do the
> same cut-wire-test with IllumOS, to at least make sure it is no worse.
> Annoyingly, I had 'cleverly' written OmniOS 'dd' image to SSD (for speed
> and I have lots to play with), only to find it fails to boot
> (root-assembly:media's mount_media can't find the volume when its SSD) so I
> will retry again today with a plain USB stick.

The predecessor was notorious for taking out expanders. Not much the OS can
do when that happens except try to reset/retry. NB, OOB the timeout for retry is
60 seconds.
 -- richard

> 
> Lund
> 
> -- 
> Jorgen Lundman       | <[email protected]>
> Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
> Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
> Japan                | +81 (0)3 -3375-1767          (home)
> 
> 
> -------------------------------------------
> illumos-discuss
> Archives: https://www.listbox.com/member/archive/182180/=now
> RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175743-23d1427b
> Modify Your Subscription: https://www.listbox.com/member/?&;
> Powered by Listbox: http://www.listbox.com



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to