Re: SSD and RAID question
On 2012-09-18, at 10:25 AM, Konstantin Olchanski wrote: On Mon, Sep 17, 2012 at 08:46:34PM -0700, Todd And Margo Chester wrote: Yes, I did miss the sda on all four partitions. Does the [2/1] mean this is the first of two drives? On 09/18/2012 11:18 AM, Christopher Tooley wrote:> Hi, this page might give you some useful information on what mdstat gives you: https://raid.wiki.kernel.org/index.php/Mdstat Christopher Tooley ctoo...@uvic.ca Systems, HEP/Astronomy UVic Hi Chris, Awesome reference. I am about to copy it down in my reference file. Thank you! -T
Re: SSD and RAID question
Hi, this page might give you some useful information on what mdstat gives you: https://raid.wiki.kernel.org/index.php/Mdstat Christopher Tooley ctoo...@uvic.ca Systems, HEP/Astronomy UVic On 2012-09-18, at 10:25 AM, Konstantin Olchanski wrote: > On Mon, Sep 17, 2012 at 08:46:34PM -0700, Todd And Margo Chester wrote: >> >> Yes, I did miss the sda on all four partitions. Does the >> [2/1] mean this is the first of two drives? >> > > > Please RTFM. > > Somehow you expect this list to teach you how to read the output of > /proc/mdstat. > > Please believe me, you will learn much more and understand things much better > by reading the MD/mdadm documentation first. > > A good place to start is the R** H** system administrators guides. > > > -- > Konstantin Olchanski > Data Acquisition Systems: The Bytes Must Flow! > Email: olchansk-at-triumf-dot-ca > Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada
Re: SSD and RAID question
On Mon, Sep 17, 2012 at 08:46:34PM -0700, Todd And Margo Chester wrote: > > Yes, I did miss the sda on all four partitions. Does the > [2/1] mean this is the first of two drives? > Please RTFM. Somehow you expect this list to teach you how to read the output of /proc/mdstat. Please believe me, you will learn much more and understand things much better by reading the MD/mdadm documentation first. A good place to start is the R** H** system administrators guides. -- Konstantin Olchanski Data Acquisition Systems: The Bytes Must Flow! Email: olchansk-at-triumf-dot-ca Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada
Re: SSD and RAID question
On 09/17/2012 07:27 PM, Jeff Siddall wrote: On 09/17/2012 03:03 PM, Todd And Margo Chester wrote: Hi Steven, Thank you! It looks like in the example that all four drives are in their own RAID1 arrays, but are missing their companion drives. A misconfiguration perhaps? -T Nope. There are 4 RAID 1 arrays on 4 separate partitions on the same drive. When the drive disappears all 4 arrays are degraded. Jeff Yes, I did miss the sda on all four partitions. Does the [2/1] mean this is the first of two drives?
Re: SSD and RAID question
On Sun, 16 Sep 2012, Todd And Margo Chester wrote: On 09/10/2012 12:41 PM, Jeff Siddall wrote: On 09/10/2012 02:52 PM, Todd And Margo Chester wrote: On 09/10/2012 10:05 AM, Jeff Siddall wrote: ME software RAID1 is very reliable Have you had a software RAID failure? What was the alert? And, what did you have to do to repair it? Never had a "software" failure. I have had [too] many hardware failures, and those show up with the standard MD email alerts (example attached below). Jeff -- This is an automatically generated mail message from mdadm A DegradedArray event had been detected on md device /dev/md1. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: Personalities : [raid1] [raid6] [raid5] [raid4] md3 : active raid1 sda5[0] 371727936 blocks [2/1] [U_] md0 : active raid1 sda1[0] 104320 blocks [2/1] [U_] md2 : active raid1 sda3[0] 2096384 blocks [2/1] [U_] md1 : active raid1 sda2[0] 16779776 blocks [2/1] [U_] unused devices: Hi Jeff, Thank you. I do not understand what I am looking at. All four entries are RAID1, meaning two drives in the array. But what two drives go together? What does the "[U_]" stand for? Up? Should md1 be [D_] for down? What does [2/1] stand for? And, just out of curiosity, is it possible to have a hot spare with the above arrangement? -T On 09/17/2012 12:22 AM, Steven J. Yellin wrote:> I believe that this is the interpretation of /proc/mdstat: > Consider, for example, > > md2 : active raid1 sda3[0] > 2096384 blocks [2/1] [U_] > > The device is /dev/md2. It is raid1, meaning partitions on two disk > drives mirror each other. One of the two is /dev/sda3; the other isn't > given because something went wrong, but I'll guess it would be /dev/sdb3 > if sdb were working, and you would also have in the md2 line an > identification of the second participant in the mirror, "sdb3[1]". The > mirrored partitions have 2096384 blocks, which I think means about 2 GB. > The "[2/1]" means there should be 2 disks in md2, but there is actually > 1. The "[U_]" would be "[UU]" if md2 were in perfect working order, but > the second drive in md2 is absent. Perhaps "U" stands for "up", or for > "available" in some language other than English. > You can have a hot spare. If the md2 line read > md2 : active raid1 sdc3[2] sdb3[1] sda3[0] > then the mirror would still consist of sda3 and sdb3. You can tell > sdc3 is the spare because of its "[2]"; only [0] and [1] are needed for > successful mirroring. > > Steven Yellin > Hi Steven, Thank you! It looks like in the example that all four drives are in their own RAID1 arrays, but are missing their companion drives. A misconfiguration perhaps? -T
Re: SSD and RAID question
I believe that this is the interpretation of /proc/mdstat: Consider, for example, md2 : active raid1 sda3[0] 2096384 blocks [2/1] [U_] The device is /dev/md2. It is raid1, meaning partitions on two disk drives mirror each other. One of the two is /dev/sda3; the other isn't given because something went wrong, but I'll guess it would be /dev/sdb3 if sdb were working, and you would also have in the md2 line an identification of the second participant in the mirror, "sdb3[1]". The mirrored partitions have 2096384 blocks, which I think means about 2 GB. The "[2/1]" means there should be 2 disks in md2, but there is actually 1. The "[U_]" would be "[UU]" if md2 were in perfect working order, but the second drive in md2 is absent. Perhaps "U" stands for "up", or for "available" in some language other than English. You can have a hot spare. If the md2 line read md2 : active raid1 sdc3[2] sdb3[1] sda3[0] then the mirror would still consist of sda3 and sdb3. You can tell sdc3 is the spare because of its "[2]"; only [0] and [1] are needed for successful mirroring. Steven Yellin On Sun, 16 Sep 2012, Todd And Margo Chester wrote: On 09/10/2012 12:41 PM, Jeff Siddall wrote: On 09/10/2012 02:52 PM, Todd And Margo Chester wrote: On 09/10/2012 10:05 AM, Jeff Siddall wrote: ME software RAID1 is very reliable Have you had a software RAID failure? What was the alert? And, what did you have to do to repair it? Never had a "software" failure. I have had [too] many hardware failures, and those show up with the standard MD email alerts (example attached below). Jeff -- This is an automatically generated mail message from mdadm A DegradedArray event had been detected on md device /dev/md1. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: Personalities : [raid1] [raid6] [raid5] [raid4] md3 : active raid1 sda5[0] 371727936 blocks [2/1] [U_] md0 : active raid1 sda1[0] 104320 blocks [2/1] [U_] md2 : active raid1 sda3[0] 2096384 blocks [2/1] [U_] md1 : active raid1 sda2[0] 16779776 blocks [2/1] [U_] unused devices: Hi Jeff, Thank you. I do not understand what I am looking at. All four entries are RAID1, meaning two drives in the array. But what two drives go together? What does the "[U_]" stand for? Up? Should md1 be [D_] for down? What does [2/1] stand for? And, just out of curiosity, is it possible to have a hot spare with the above arrangement? -T
Re: SSD and RAID question
On 09/10/2012 12:41 PM, Jeff Siddall wrote: On 09/10/2012 02:52 PM, Todd And Margo Chester wrote: On 09/10/2012 10:05 AM, Jeff Siddall wrote: ME software RAID1 is very reliable Have you had a software RAID failure? What was the alert? And, what did you have to do to repair it? Never had a "software" failure. I have had [too] many hardware failures, and those show up with the standard MD email alerts (example attached below). Jeff -- This is an automatically generated mail message from mdadm A DegradedArray event had been detected on md device /dev/md1. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: Personalities : [raid1] [raid6] [raid5] [raid4] md3 : active raid1 sda5[0] 371727936 blocks [2/1] [U_] md0 : active raid1 sda1[0] 104320 blocks [2/1] [U_] md2 : active raid1 sda3[0] 2096384 blocks [2/1] [U_] md1 : active raid1 sda2[0] 16779776 blocks [2/1] [U_] unused devices: Hi Jeff, Thank you. I do not understand what I am looking at. All four entries are RAID1, meaning two drives in the array. But what two drives go together? What does the "[U_]" stand for? Up? Should md1 be [D_] for down? What does [2/1] stand for? And, just out of curiosity, is it possible to have a hot spare with the above arrangement? -T
Re: SSD and RAID question
On 09/10/2012 02:52 PM, Todd And Margo Chester wrote: On 09/10/2012 10:05 AM, Jeff Siddall wrote: ME software RAID1 is very reliable Have you had a software RAID failure? What was the alert? And, what did you have to do to repair it? Never had a "software" failure. I have had [too] many hardware failures, and those show up with the standard MD email alerts (example attached below). Jeff -- This is an automatically generated mail message from mdadm A DegradedArray event had been detected on md device /dev/md1. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: Personalities : [raid1] [raid6] [raid5] [raid4] md3 : active raid1 sda5[0] 371727936 blocks [2/1] [U_] md0 : active raid1 sda1[0] 104320 blocks [2/1] [U_] md2 : active raid1 sda3[0] 2096384 blocks [2/1] [U_] md1 : active raid1 sda2[0] 16779776 blocks [2/1] [U_] unused devices:
Re: SSD and RAID question
On 09/10/2012 02:52 PM, Todd And Margo Chester wrote: On 09/10/2012 10:05 AM, Jeff Siddall wrote: ME software RAID1 is very reliable Have you had a software RAID failure? What was the alert? And, what did you have to do to repair it? yes, many times (> 10). Emails come from mdmonitor (including pending failures). Only once did the disk controller/kernel get really confused, all other times the system kept on humming along and the user never noticed. I was able to change the disk at *my* earliest convenience, not when the user is upset about their desktop system being down. Usually I just swap the drive with a blank and do something like below, of course using care to properly set src and dst. src=/dev/sda dst=/dev/sdb sfdisk --dump ${src} | sfdisk ${dst} # copy partition table mdadm /dev/md0 -a ${dst}1 # /boot mdadm /dev/md1 -a ${dst}2 # /boot2 mdadm /dev/md2 -a ${dst}3 # large lvm vg for /, swap, /var, etc. sleep 5; # let /boot md0 sync [ $dst = /dev/sdb ] && echo -e "root (hd1,0) \n setup (hd1)" | grub [ $dst = /dev/sda ] && echo -e "root (hd0,0) \n setup (hd0)" | grub
Re: SSD and RAID question
On 09/10/2012 10:05 AM, Jeff Siddall wrote: ME software RAID1 is very reliable Have you had a software RAID failure? What was the alert? And, what did you have to do to repair it? -T
Re: SSD and RAID question
On 09/05/2012 06:34 PM, jdow wrote: On 2012/09/05 11:38, Todd And Margo Chester wrote: On 09/04/2012 12:21 PM, Konstantin Olchanski wrote: Cherryville drives have a 1.2 million hour MTBF (mean time >between failure) and a 5 year warranty. > Note that MTBF of 1.2 Mhrs (137 years?!?) is the*vendor's estimate*. Baloney check. 1.2 Mhrs does not mean that the device is expected The bottom line is huge MTBF numbers are, by definition, baloney because they are calculations based on some information which is _definitely_ not actually tested since no manufacturer runs their components for 137 years. If there is one thing you _can_ be sure about it is that _no_ real hard drive will actually last that long! Case in point: I bought a bunch of WD RE2 drives a few years back. They too had a MTBF of 1.2M hours and a 5 year warranty and all but one had failed within 5 years. I can also confirm that these failures were _not_ because they were running out of spec. IIRC the max. temperature for any of them was 37 C, and all were in server chassis mounted on a concrete floor. If I were you I would assume _all_ components _will_ fail. If that failure would make your life miserable, plan on having a spare. In the case of HDs I would _always_ use RAID, and IME software RAID1 is very reliable. Jeff
Re: SSD and RAID question
On 2012/09/08 18:34, Todd And Margo Chester wrote: On 09/05/2012 03:34 PM, jdow wrote: But if the real limit is related to read write cycles on the memory locations you may find that temperature has little real affect on the system lifetime. I did some reliability analysis for the military about 25 yuears ago. It was pretty much following general guidlines and most of it was baloney. What I do remember was that failures from temperature was not a linear curve, it was an exponential curve. I will strongly concur with you that heat is your enemy. What I would love to see, but have never seen, is a Peltier heat pump to mount hard drives on. -T Um, yes, temperature is a REAL problem when it gets high. But so is a limited number of useable rewrites for memory locations. At "reasonable" temperatures it should last a long time if write cycle limits are not a problem. (Relays have a cycle related lifetime limit as well as the usual temperature limit, as well.) And thank Ghu that we're not dealing with radiation here. That gets ugly, fast. I had to use two generations old TTL logic without gold doping for the Frequency Synthesizer and Distribution Unit when creating the basic Phase 2 GPS satellite design for that reason. Ick! {^_-}
Re: SSD and RAID question
On 09/05/2012 03:34 PM, jdow wrote: But if the real limit is related to read write cycles on the memory locations you may find that temperature has little real affect on the system lifetime. I did some reliability analysis for the military about 25 yuears ago. It was pretty much following general guidlines and most of it was baloney. What I do remember was that failures from temperature was not a linear curve, it was an exponential curve. I will strongly concur with you that heat is your enemy. What I would love to see, but have never seen, is a Peltier heat pump to mount hard drives on. -T
Re: SSD and RAID question
On 2012/09/05 11:38, Todd And Margo Chester wrote: On 09/04/2012 12:21 PM, Konstantin Olchanski wrote: Cherryville drives have a 1.2 million hour MTBF (mean time >between failure) and a 5 year warranty. > Note that MTBF of 1.2 Mhrs (137 years?!?) is the*vendor's estimate*. Baloney check. 1.2 Mhrs does not mean that the device is expected to last 137 years. It means that if you have 1.2 million devices in front of you on a test bench, you would expect one device to fail in one hour. -T Baloney check back at you. If you have 1.2 million devices in front of you all operating under the same conditions as specified for the 1.2 million hours MTBF that half of them would have failed. bu the end of the 1.2 million hours. Commentary here indicates those conditions are a severe derating on the drive's transaction capacity. It does not say much of anything about the drive's life under other conditions because no failure mechanism is cited. For example, if the drive is well cooled and that means the components inside are well cooled rather than left in usual mountings the life might be far greater simply based on the component temperature drop. 10C can make a large difference in lifetime. But if the real limit is related to read write cycles on the memory locations you may find that temperature has little real affect on the system lifetime. If I could design a system that worked off a fast normal RAID and could buffer in the SSD RAID with a safe automatic fall over when the SSD RAID failed, regardless of failure mode, and I needed the speed you can bet I'd be in there collecting statistics for the company for whom I did the work. There is a financial incentive here and potential competitive advantage to KNOW how these drives fail. With 100 drives after the first 5 or 10 had died some knowledge might be gained. And, of course, if the drives did NOT die even more important knowledge would be gained. Simple MTBF under gamer type use is pretty useless for a massive database application. And if manufacturers are not collecting the data there is a really good potential for competitive advantage if you collect your own data and hold it proprietary. I betcha somebody out there is doing this right now for Wall Street automated trading uses if nothing else. {^_^}
Re: SSD and RAID question
On 09/04/2012 12:21 PM, Konstantin Olchanski wrote: every USB3 and all except 1 brand USB2 flash drives fail within a few weeks I have been selling a "few" Kanguru USB 3 flash drives as backup sticks. So far so good. Any idea what is happening on your end? (USB3 sticks are so fast.)
Re: SSD and RAID question
On 09/04/2012 12:21 PM, Konstantin Olchanski wrote: Cherryville drives have a 1.2 million hour MTBF (mean time >between failure) and a 5 year warranty. > Note that MTBF of 1.2 Mhrs (137 years?!?) is the*vendor's estimate*. Baloney check. 1.2 Mhrs does not mean that the device is expected to last 137 years. It means that if you have 1.2 million devices in front of you on a test bench, you would expect one device to fail in one hour. -T
Re: SSD and RAID question
On Wed, Sep 05, 2012 at 10:34:24AM +0200, Sean Murray wrote: > On 09/04/2012 09:21 PM, Konstantin Olchanski wrote: > >On Sun, Sep 02, 2012 at 05:33:24PM -0700, Todd And Margo Chester wrote: > >> > >>Cherryville drives have a 1.2 million hour MTBF (mean time > >>between failure) and a 5 year warranty. > >> > > > >Note that MTBF of 1.2 Mhrs (137 years?!?) is the *vendor's estimate*. > > Its worse than that, if you read their docs, that number is based on an > average write/read rate of (intel) 20G per day, which is painfully little > for a server. > I am not sure if I believe any of these numbers. Here is why. I just checked my running systems and the oldest "/" filesystem on a USB Flash drive is "Nov 2010", not quite 2 years old. (other USB Flash drives are probably older than that but have "new" filesystems when upgraded from SL4 to SL5 to SL6). USB Flash drives were never intended for heavy duty use as "/" disks, the Linux ext3/ext4 is not even supposed to be easy on flash media, and I remember the dire predictions of flash media "wear-out". I read all this to mean that nobody knows how long flash media really lasts in production use. I think "they" have to fix the sudden-death problem first, *then* we will start seeing how the media life-time and wear-out comes into play. -- Konstantin Olchanski Data Acquisition Systems: The Bytes Must Flow! Email: olchansk-at-triumf-dot-ca Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada
Re: SSD and RAID question
On 09/04/2012 09:21 PM, Konstantin Olchanski wrote: On Sun, Sep 02, 2012 at 05:33:24PM -0700, Todd And Margo Chester wrote: Cherryville drives have a 1.2 million hour MTBF (mean time between failure) and a 5 year warranty. Note that MTBF of 1.2 Mhrs (137 years?!?) is the *vendor's estimate*. Its worse than that, if you read their docs, that number is based on an average write/read rate of (intel) 20G per day, which is painfully little for a server. I looked at these for caching data but from our use case and of course assuming the MTBF to be accurate, the MTBF would be 6 months Sean Actual failure rates observed in production are unknown, the devices have not been around long enough. However, if you read product feedback on newegg, you may note that many SSDs seem to suffer from the "sudden death" syndrome - a problem we happily no longer see on spinning disks. I guess the "5 year warranty" is real enough, but it does not cover your costs in labour for replacing dead disks, costs of down time and costs of lost data. ... risk of dropping RAID in favor of just one of these drives? To help you make a informed decision, here is my data. I have about 9 SSDs in production use (most are in RAID1 pairs), oldest has been running since last October: - 1 has 3 bad blocks (no RAID1), - 1 has SATA comm problem (vanishes from the system - system survives because it's a RAID1 pair). - 0 dead so far I have about 20 USB and CF flash drives in production used as SL4/5/6 system disks, some in RAID1, some as singles, oldest has been in use for 3 ( or more?) years. There are zero failures, except 1 USB flash has a few bad blocks, except for infant mortality (every USB3 and all except 1 brand USB2 flash drives fail within a few weeks). All drives used as singles are backed up nightly (rsync). All spining disks are installed in RAID1 pairs. Would *I* use single drives (any technology - SSD, USB flash, spinning)? Only for a system that does not require 100% uptime (is not used by any users) and when I can do daily backups (it cannot be in a room without a GigE network). smime.p7s Description: S/MIME Cryptographic Signature
Re: SSD and RAID question
On Sun, Sep 02, 2012 at 05:33:24PM -0700, Todd And Margo Chester wrote: > > Cherryville drives have a 1.2 million hour MTBF (mean time > between failure) and a 5 year warranty. > Note that MTBF of 1.2 Mhrs (137 years?!?) is the *vendor's estimate*. Actual failure rates observed in production are unknown, the devices have not been around long enough. However, if you read product feedback on newegg, you may note that many SSDs seem to suffer from the "sudden death" syndrome - a problem we happily no longer see on spinning disks. I guess the "5 year warranty" is real enough, but it does not cover your costs in labour for replacing dead disks, costs of down time and costs of lost data. > > ... risk of dropping RAID in favor of just one of these drives? > To help you make a informed decision, here is my data. I have about 9 SSDs in production use (most are in RAID1 pairs), oldest has been running since last October: - 1 has 3 bad blocks (no RAID1), - 1 has SATA comm problem (vanishes from the system - system survives because it's a RAID1 pair). - 0 dead so far I have about 20 USB and CF flash drives in production used as SL4/5/6 system disks, some in RAID1, some as singles, oldest has been in use for 3 ( or more?) years. There are zero failures, except 1 USB flash has a few bad blocks, except for infant mortality (every USB3 and all except 1 brand USB2 flash drives fail within a few weeks). All drives used as singles are backed up nightly (rsync). All spining disks are installed in RAID1 pairs. Would *I* use single drives (any technology - SSD, USB flash, spinning)? Only for a system that does not require 100% uptime (is not used by any users) and when I can do daily backups (it cannot be in a room without a GigE network). -- Konstantin Olchanski Data Acquisition Systems: The Bytes Must Flow! Email: olchansk-at-triumf-dot-ca Snail mail: 4004 Wesbrook Mall, TRIUMF, Vancouver, B.C., V6T 2A3, Canada
Re: SSD and RAID question
Hi Todd And Margo Chester! On 2012.09.02 at 17:33:24 -0700, Todd And Margo Chester wrote next: > On several Windows machines lately, I have been using > Intel's Cherryville enterprise SSD drives. They work > very, very well. > > Cherryville drives have a 1.2 million hour MTBF (mean time > between failure) and a 5 year warranty. > > I have been thinking, for small business servers > with a low data requirement, what would be the > risk of dropping RAID in favor of just one of these > drives? Personally I wouldn't recommend dropping RAID on *any* server where loss of functionality can cause you any problems. That is, if its functionality is duplicated and load balancer will switch to other server automatically - sure, deploy it without RAID, but if its loss can cause you business troubles, it's a bad idea. We do use SSDs for some kind of small business servers, but we prefer to buy at very least two of them and use software RAID; it's fine to use cheaper SSDs, RAID1 of them is still better idea than single more expensive one. (unless we are talking about something ultra-reliable like SLC drive, but these are like 10 times more expensive). You should understand that MTBF values or warranty time are absolutely useless when you think about chance of single failure or calculating how soon the problem will likely happen; taking them into account is worth it only when you calculate cost of usage or replacement rate for big park of computers (say, 1000's). If you want to calculate SSD reliability for server task, the best indicator would be amount of allowed writes; in some cases, like database journals (redo logs in Oracle / WAL logs in postgresql / etc) you might need really lots of writes, if you calculate the value it's easy to check that it's impossible for consumer SSD drive to last for years under such load (you need SLC drive or at least enterprise-grade MLC drive like Intel 710). There are other usage scenarios where SLC SSDs are must, like ZFS ZIL. SSDs reliability is okay for mostly-read usage scenarios, but: the problem is that SSDs fail in completely different way than HDDs. Actually, I've never seen SSD that has run out of write cycles - but I've seen quite a few SSDs who died from flash controller failure or something similar to that. That is, most likely problem with SSD is - "oops, it died". While this happens with HDDs too, it's way more likely to just get bad blocks on it. So MTBF for SSD != MTBF for HDDs as we are talking about completely different types of common failures; you can't even compare these numbers directly. > > Seems to me the RAID controller would have a worse > MTBF than a Cherryville SSD drive? > > And, does SL 6 have trim stuff built into it? Yes, make sure to add "discard" option to ext4 filesystem mounts. However, it won't work on hardware RAID and probably won't work on software one either - though I'm not 100% sure of the later. This means, if you are going to RAID SSD's, using Sandforce-driven SSDs isn't recommended as you will lose lots of performance over time; use Marvell-based solutions like Crucial M4, Plextor M3/M3P, Intel 510, OCZ Vertex 4 and few others; don't use Marvell-based Corsair Performance Pro with hardware RAID controllers as they are often incompatible, however. -- Vladimir
Re: SSD and RAID question
On Mon, Sep 3, 2012 at 1:17 AM, Todd And Margo Chester wrote: > Hmm. Never had a bad hardware RAID controller. Had several > mechanical hard drives go bad. Lord knows I have. The worst were these "they fell of my uncle's truck in southeast Asia" adapters with the chip numbers apparently burned off them, pretending to the kernel that they were an Adaptec chipset. They didn't even fit properly in the cases, due to badly machined and mounted mounting plates. I managed to get those shipped back as unacceptable: any cost savings in getting cheaper hardware faster was completely wasted in the testing failures and the "just wedge them in!!!" and resulting failures as the cards popped loose when the systems warmed up. I've also dealt with some high performance disk controller failures in bleeding edge hardware. It's why I tend to avoid bleeding edge hardware: let someone who needs that bleeding edge debug it for me. > Anyone have an opinion(s) on SSD's in a small work group server? They're very expensive for what is, usually, unnecessary though large performance gains. For certain high performance proxy or database performance, they're invaluable. Monitoring system performance for them can get. a little weird. I've seen system loads go over 150 on SL 6 compatible systems, thought the systems were still active and responsive and performing well.
Re: SSD and RAID question
Hmm. Never had a bad hardware RAID controller. Had several mechanical hard drives go bad. Anyone have an opinion(s) on SSD's in a small work group server? We've had very good luck with SSDs (singly on workstations or spanned volumes on servers) as primary storage mirroring to a spanned volume of cheap spinning disks. Never had an SSD failure (yet), but the cheap disk backup is live anyway so we're not too worried. We *do* have a schedule for replacement based on the historical write average on the SSDs, though. Eventually they will brick so before that we have to replace them and planning it ahead of time is way better than living in panic-replacement mode whenever they eventually die. At the moment it looks like workstations won't even come close to needing replacements before the systems are end-of-life anyway, but the servers are a different issue (some of them are under considerable load, and will probably require replacement every 2 years to be totally safe). As for the OP's question about trim: trim is available as a mount option as well as a few others that limit the tiny-write problem (like the noatime option and putting various cache directories in tmpfs in RAM instead of on disk, etc.) and change the way the seek/writes are scheduled (default is optimized for platters, which is a deoptimization for SSDs). You can find a wealth of information on the net about these issues so I won't bore you with the details here.
Re: SSD and RAID question
On 09/02/2012 08:26 PM, Nathan wrote: On Sun, Sep 2, 2012 at 6:33 PM, Todd And Margo Chester mailto:toddandma...@gmail.com>> wrote: Hi All, On several Windows machines lately, I have been using Intel's Cherryville enterprise SSD drives. They work very, very well. Cherryville drives have a 1.2 million hour MTBF (mean time between failure) and a 5 year warranty. I have been thinking, for small business servers with a low data requirement, what would be the risk of dropping RAID in favor of just one of these drives? Seems to me the RAID controller would have a worse MTBF than a Cherryville SSD drive? And, does SL 6 have trim stuff built into it? What do you all think? In my experience, I've had more problems with hardware RAID controllers than any other component (hardware OR software) except for traditional hard drives themselves. We switched to software RAID (Linux) and ZFS (*BSD and Solaris) years ago. But that's just us. YMMV. ~ Nathan Hmm. Never had a bad hardware RAID controller. Had several mechanical hard drives go bad. Anyone have an opinion(s) on SSD's in a small work group server?
Re: SSD and RAID question
On 2012/09/02 20:26, Nathan wrote: In my experience, I've had more problems with hardware RAID controllers than any other component (hardware OR software) except for traditional hard drives themselves. We switched to software RAID (Linux) and ZFS (*BSD and Solaris) years ago. But that's just us. YMMV. Speaking of software raid, I have four disks that are from a RAID on a motherboard with the Intel ICH10 controller. They were in RAID 5. The motherboard is a "was a motherboard" for the most part. I note that the Linux raid could read the disks in that machine. If I stick the four disks into four USB<->SATA adapters is it likely the Linux raid software will be able to piece them together so I can get the "not so critical" last few bits of effort off them that I've not been able to keep the motherboard up long enough to get already? (The native system on the disks was Windows 7 with which I make some real income.) If there's a good chance it would work that will change my recovery strategy a little. {^_^}
Re: SSD and RAID question
On Sun, Sep 2, 2012 at 6:33 PM, Todd And Margo Chester < toddandma...@gmail.com> wrote: > Hi All, > > On several Windows machines lately, I have been using > Intel's Cherryville enterprise SSD drives. They work > very, very well. > > Cherryville drives have a 1.2 million hour MTBF (mean time > between failure) and a 5 year warranty. > > I have been thinking, for small business servers > with a low data requirement, what would be the > risk of dropping RAID in favor of just one of these > drives? > > Seems to me the RAID controller would have a worse > MTBF than a Cherryville SSD drive? > > And, does SL 6 have trim stuff built into it? > > What do you all think? > > In my experience, I've had more problems with hardware RAID controllers than any other component (hardware OR software) except for traditional hard drives themselves. We switched to software RAID (Linux) and ZFS (*BSD and Solaris) years ago. But that's just us. YMMV. ~ Nathan
SSD and RAID question
Hi All, On several Windows machines lately, I have been using Intel's Cherryville enterprise SSD drives. They work very, very well. Cherryville drives have a 1.2 million hour MTBF (mean time between failure) and a 5 year warranty. I have been thinking, for small business servers with a low data requirement, what would be the risk of dropping RAID in favor of just one of these drives? Seems to me the RAID controller would have a worse MTBF than a Cherryville SSD drive? And, does SL 6 have trim stuff built into it? What do you all think? Many thanks, -T