comment below... > On Jan 21, 2015, at 4:17 PM, Jorgen Lundman via illumos-discuss > <[email protected]> wrote: > >> Yeah it looks like i spoke too soon...I just realized you said the SSDs are >> actually failing not just going offline and appearing to fail. > > Yeah, these servers run great, until a disk dies :) > > 9:08am up 1098 day(s), 20:49, 1 user, load average: 0.29, 0.28, 0.25 > 9:08am up 1017 day(s), 15:07, 0 users, load average: 0.57, 0.52, 0.46 > > > >> >> In extreme cases, a bad drive can cause POST to fail. >> > > Yes, had something like that 2-3 years ago. But the last three just took > out the system, technically console worked but any command causing IO would > hang, including reboot. Powercycle was the way out, POST takes a little > longer looking for the dead device, but eventually carries on. > > >> NB, also in those cases, no permanent damage was done and a zpool scrub >> showed >> no data loss :-) > > No, not lost any data, although, since a dying disk forces a reboot there > is some customer outage. If it happens to be a mail server, dovecot can > leave lock files around (of course, letting NFSv4 client 'do its thing' > generally ends up correct, even if it takes 10-15 mins to sort itself out) > > >> But do have sata expanders in the system ? Those are known to be toxic. > > I believe the LSI sas2008 is sas all the way, but the SSDs are straight > SATA Intel 360 (IIRC). > > If mpt_sas was never released, are there controllers with software that was > released? These are generic Supermicro ~20 to ~40 disk servers, and the LSi > card is added separately (for JBOD). So it wouldn't be impossible to change > controller. But if it's more of a general problem with SATA then it > wouldn't matter. > > Even thought I have failed to replicate the failure case, I will do the > same cut-wire-test with IllumOS, to at least make sure it is no worse. > Annoyingly, I had 'cleverly' written OmniOS 'dd' image to SSD (for speed > and I have lots to play with), only to find it fails to boot > (root-assembly:media's mount_media can't find the volume when its SSD) so I > will retry again today with a plain USB stick.
The predecessor was notorious for taking out expanders. Not much the OS can do when that happens except try to reset/retry. NB, OOB the timeout for retry is 60 seconds. -- richard > > Lund > > -- > Jorgen Lundman | <[email protected]> > Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) > Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) > Japan | +81 (0)3 -3375-1767 (home) > > > ------------------------------------------- > illumos-discuss > Archives: https://www.listbox.com/member/archive/182180/=now > RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175743-23d1427b > Modify Your Subscription: https://www.listbox.com/member/?& > Powered by Listbox: http://www.listbox.com ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com
