On Tue, 07 Jun 2011, Miles Fidelman wrote: > b. you're running RAID - instead of the drive dropping out of the > array, the entire array slows down as it waits for the failing drive > to (eventually) respond
Eh, it is worse. A failing drive _will_ drop out of the array sooner or later, and it can be very bad if it is does so 'sooner' for any other reason than an imminent unit failure: there is a high probability of other device(s) deciding to also time out while the array is degraded or rebuilding, and it results in service downtime (and usually data loss). You never want discs dropping off the array due to non-immediate-failure-related performance problems, the chance of multiple drops causing an array failure is too high. You want to know the disk is slow, and to replace it in controlled conditions. This problem is *common*. Don't do hardware RAID on regular consumer crap without SCT ERC support (aka TLER/CCTL/ERC), and don't buy expensive crap with buggy firmware that the vendor refuses to issue a public fix for to save face (but which you can get from your RAID card vendor if you are very lucky). Linux smartctl gives you access to the drive's SCT ERC page if it is supported. Also, any device model (not a SPECIFIC device) for which firmware updates are available that reduce the effective throughput should be avoided like the plague, as that indicates they have shipped models with manufacturing or component issues, and you can never be sure of what you'll get when you buy a new one. If you already have bought such a device with known high design or manufacturing defects/weakness ratio, it depends on your luck whether you got something good or a lemon. If SMART finds *NO* issues (no increasing high fly writes, no reallocated sectors grow), and throughput tests show the expected response, you have a good one: be happy. If either test shows any such issues, remove it from production. Secure-erase it, apply any firmware updates if you want to use it as throw-away backup media (make sure the data is encrypted), or send it for recycling. Linux software raid is much more forgiving by default (and it can tune the timeout for each component device separately), and will just slow down most of the time instead of kicking component devices off the array until dataloss happens. Could be useful if you got duped by the vendor and sold a defective drive that can only operate safely out-of-spec, but can still be useful to you. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20110607152700.gb1...@khazad-dum.debian.net