Steve Shockley wrote: > RedShift wrote: >> Anyone got any similar experiences with hardware RAID cards? Hardware >> RAID has always been misery for me. > > I've had two instances where older Adaptec RAID cards had a disk failure > and then reverted to a week-old copy of the data. I'm not quite sure > how that's possible, but having it happen on two different machines, at > two different employers, in two different brands of servers (Dell, HP > Netserver) made me a real believer in Adaptec. > > I've had generally good luck with Compaq/HP and LSI controllers.
I'll give you a fairly realistic possible explanation: For whatever reason, at least some SCSI drives in at least some RAID systems (I saw a lot of it on SCSI Dell PERC cards) will just hop off-line. Nothing wrong with the drive, if you pull the drive out and put it back in, either in SW or physically, it will go back on-line and happily rebuild. Curiously, these same PERC cards also lacked any kind of beeper to let you know they were in any kind of degraded mode. They would turn the deactivated drive "orange" instead of green, but even that was in dispute with one of our offices which managed to have two drives fail in a RAID10 array and swore there were never any orange lights on the machine ("I was checking!! Really!"). SO, I suspect that one of your drives hopped off-line and no one noticed. A week later, the OTHER drive failed. Either the system was then rebooted or maybe it just figured, "Hey, let's see if we can revive that other drive" on its own, and ta-da, you are running with week old data. And no, I can't prove it... [rest of this isn't aimed at Steve...] RAID is a complexity, and complexity is the enemy of security and reliability. It *may* help protect against data loss. It *may* keep you running. It *may* also be the cause of the data loss or downtime. PROPERLY implemented, RAID can be a part of your event recovery process. It certainly can give you performance gains. But if you don't understand the system in your hands, it will most likely bite you hard at some point. Alternatives should be considered: many apps such as firewalls and DNS servers don't need/want RAID at all, as you can "mirror" entire MACHINES. At that point, the disk failure becomes a special case of "system failure" and you are ready for it. RAID becomes simply an unneeded complexity. For many systems, L. V. Lammert's rsync system (or even dump/restore) to a second disk in the system is wonderful. Done properly, it can be SUPERIOR to RAID for some apps, in that it gives you a roll-back if you make an error on a change or upgrade...and a number of other "failure modes" where you wish you could "roll back" to a previous version. The question of "HW vs SW RAID" is wrong. The question is "understood vs. not understood RAID solutions". I understood very well the old Netware 3/4 software mirroring, and had complete faith in it, and had the experience to prove it on a number of cases. On the other hand, I saw a lot of systems that were completely hosed because people DIDN'T understand the system and expected magic to happen (or someone else to be on call) when the system failed. Same thing goes for HW RAID. HW RAID is "easy" to get running, but that usually means you have NO idea how it is really working, and that makes it less likely you will know how to get it back to fully functional state AFTER an event. In most (yes, really, I'm convinced it is the vast majority) cases, people make the error of thinking "getting it running" is the challenge. NO!! The point of RAID (and the rest of your system) is to keep your system serviceable AFTER something goes horribly wrong. What happens when the system goes down hard, how do you bring the system back to a happy state after a drive failure, what happens if you try to stick too small a drive in (yes, it won't work, but how will it inform you the new drive is one pseudo-cylinder smaller than the old ones? Knowing that will save you major headaches when it happens when you can no longer get the exact model of drive you had in place before...or the mfg changes the drive specs without changing the model number (yes, that happened to a friend of mine)). Moral: learn your RAID system. Whatever it is, you have to understand it. Nick.