RAID5/ext3 trouble...
Hey all I formerly had a gentoo box with a 4x250GB software RAID5, no LVM, ext3 filesystem, using disks sdc,sdd,sde and sdf. sda and sdb were the OS disks. I was to move the RAID over to a new machine running debian. Further to this, my backup of the data turned out to be only partial. The new disks in the system appeared as sda, sdb, sdc and sdd. I've then done a series of damaging things with them: 1. I've added 3 of them as spares rather than as drives with existing data. 2. When trying to assemble the array, this started rebuilding (effectively, wiping) the fourth disk. I did this several times (and I suspect I've clobbered data on more than one drive) before realizing I can sidestep this by assembling three drives at a time - which raises the array as degraded and doesn't kick off anything that can be destructive. 3. I've run --create 4. I've cleared the persistent superblocks off the drives and attempted to recreate them. Roughly at this point I decided this was unsalvageable and went to the backup, only to find out it failed to back up parts of the data. A memorable moment in my life. I've then blocked together a bunch of other harddrives to assemble a ~800GB volume, plonked an LVM on it, shared it as nfs, and made "dd if=/dev/sdX ..." copies in files of all raw RAID drives over LAN. At least I have a snapshot of things as they are right now before I start "fixing" things again. I attempted assembling a, b and c (d is the one I suspect is most damaged) - Raid5 degraded with 1 drive missing, and asking mke2fs where superblocks should lie. Half of these locations have superblocks fsck is willing to go on. The other doesn't. I just ran fsck -y on the md device, it's been "fixing" things for 6 hours now and I suspect whatever is left afterwards will not be of much use. I think the RAID mechanism is getting the stripes wrong, resulting in a total mish-mash of filesystem internals for e2fsck. I can attempt doing that with all the valid superblocks I found (about 8) and I can attempt assembling other combinations of the four physical drives. My most serious concern is a lot of photos that lived on this box, so mild filesystem corruption that will affect some percent of the small files would be more than tolerable. My questions: 1. Short of seeing a specialist shop, is there any recommended course of action to restore data off the array data? (technical suggestions more than welcome, recommendations for sanely-priced commercial software are also welcome) My current plan is to try and salvage some stuff attempting fsck using the various valid superblocks, copy off any files that look half-sane, then trying again with a different subset of the four drives. This is very time-consuming (a restoration of the 3 drives is ~10hrs over Gig ethernet) but seems the first thing to try. 2. Is there some minimally-destructive and readily available way to pool all four drives together and rather than reconstruct one entire drive from the data on the other thre, attempt reconstructing every stripe separately (based on the assumption drive X can help reconstruct a stripe where drive Y got clobbered and vice versa)? 3. If you think I should be flogged for not verifying the backup is AOK before diving in, kindly take a number and stand in line. I shall be distributing low-differential SCSI cables for flogging shortly. Thank you! = To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: RAID5/ext3 trouble...
I suggest you read the latest summary on the reports on storage at http://storagemojo.com/?p=383 http://storagemojo.com/?p=378 The bottom line conclusions I made out of those are 1. Do not use RAID5. Stick to RAID 1 (0+1). 2. Use cheap SATA storage - even build-yourself. The big bucks don't buy you real reliability. 3. If at all possible - implement a 3 copies/file scheme rather than RAID1/5/6. In the long run is is much simpler and more reliable. I'd be glad to discuss this more, but man, keep your posts shorter :-) Dan On 3/5/07, Gunny Smith <[EMAIL PROTECTED]> wrote: Hey all I formerly had a gentoo box with a 4x250GB software RAID5, no LVM, ext3 filesystem, using disks sdc,sdd,sde and sdf. sda and sdb were the OS disks. I was to move the RAID over to a new machine running debian. Further to this, my backup of the data turned out to be only partial. The new disks in the system appeared as sda, sdb, sdc and sdd. I've then done a series of damaging things with them: 1. I've added 3 of them as spares rather than as drives with existing data. 2. When trying to assemble the array, this started rebuilding (effectively, wiping) the fourth disk. I did this several times (and I suspect I've clobbered data on more than one drive) before realizing I can sidestep this by assembling three drives at a time - which raises the array as degraded and doesn't kick off anything that can be destructive. 3. I've run --create 4. I've cleared the persistent superblocks off the drives and attempted to recreate them. Roughly at this point I decided this was unsalvageable and went to the backup, only to find out it failed to back up parts of the data. A memorable moment in my life. I've then blocked together a bunch of other harddrives to assemble a ~800GB volume, plonked an LVM on it, shared it as nfs, and made "dd if=/dev/sdX ..." copies in files of all raw RAID drives over LAN. At least I have a snapshot of things as they are right now before I start "fixing" things again. I attempted assembling a, b and c (d is the one I suspect is most damaged) - Raid5 degraded with 1 drive missing, and asking mke2fs where superblocks should lie. Half of these locations have superblocks fsck is willing to go on. The other doesn't. I just ran fsck -y on the md device, it's been "fixing" things for 6 hours now and I suspect whatever is left afterwards will not be of much use. I think the RAID mechanism is getting the stripes wrong, resulting in a total mish-mash of filesystem internals for e2fsck. I can attempt doing that with all the valid superblocks I found (about 8) and I can attempt assembling other combinations of the four physical drives. My most serious concern is a lot of photos that lived on this box, so mild filesystem corruption that will affect some percent of the small files would be more than tolerable. My questions: 1. Short of seeing a specialist shop, is there any recommended course of action to restore data off the array data? (technical suggestions more than welcome, recommendations for sanely-priced commercial software are also welcome) My current plan is to try and salvage some stuff attempting fsck using the various valid superblocks, copy off any files that look half-sane, then trying again with a different subset of the four drives. This is very time-consuming (a restoration of the 3 drives is ~10hrs over Gig ethernet) but seems the first thing to try. 2. Is there some minimally-destructive and readily available way to pool all four drives together and rather than reconstruct one entire drive from the data on the other thre, attempt reconstructing every stripe separately (based on the assumption drive X can help reconstruct a stripe where drive Y got clobbered and vice versa)? 3. If you think I should be flogged for not verifying the backup is AOK before diving in, kindly take a number and stand in line. I shall be distributing low-differential SCSI cables for flogging shortly. Thank you! = To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED] = To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: RAID5/ext3 trouble...
On 07/03/07, Dan Bar Dov <[EMAIL PROTECTED]> wrote: I suggest you read the latest summary on the reports on storage at http://storagemojo.com/?p=383 http://storagemojo.com/?p=378 The bottom line conclusions I made out of those are 1. Do not use RAID5. Stick to RAID 1 (0+1). 2. Use cheap SATA storage - even build-yourself. The big bucks don't buy you real reliability. 3. If at all possible - implement a 3 copies/file scheme rather than RAID1/5/6. In the long run is is much simpler and more reliable. I'd be glad to discuss this more, but man, keep your posts shorter :-) Thanks for the pointers. Now - how would you suggest implementing your point 3, short of having access to an implementation of Google FS? I'm in a position to re-design a system which requires reliable, fault-tolerant access to files over LAN for processing (i.e. read files, calculate stuff on their input, write other files and to a database) and archiving. I'd rather build the solution on proven tools instead of re-inventing a home-grown solution from scratch, but so far I haven't found something I can bet the company's fate on. The current system is built on Windows and though the business owner is open to think about a move to Linux, it will have to be a gradual. Thanks, --Amos
Re: RAID5/ext3 trouble...
That's a good question to which I have no answer. I don't know how google do it. I can think of 1. special file system 2. some kind of "scrubber" - a daemon scanning for FS changes and copying whatever changed 3. use a sync tool (rsync?) on adaily (hourly?) basis I doubt google has 1. This is some good startup material. Dan On 3/18/07, Amos Shapira <[EMAIL PROTECTED]> wrote: On 07/03/07, Dan Bar Dov <[EMAIL PROTECTED]> wrote: > I suggest you read the latest summary on the reports on storage at > http://storagemojo.com/?p=383 > http://storagemojo.com/?p=378 > > The bottom line conclusions I made out of those are > 1. Do not use RAID5. Stick to RAID 1 (0+1). > 2. Use cheap SATA storage - even build-yourself. The big bucks don't > buy you real reliability. > 3. If at all possible - implement a 3 copies/file scheme rather than > RAID1/5/6. In the long run is is much simpler and more reliable. > > I'd be glad to discuss this more, but man, keep your posts shorter :-) Thanks for the pointers. Now - how would you suggest implementing your point 3, short of having access to an implementation of Google FS? I'm in a position to re-design a system which requires reliable, fault-tolerant access to files over LAN for processing (i.e. read files, calculate stuff on their input, write other files and to a database) and archiving. I'd rather build the solution on proven tools instead of re-inventing a home-grown solution from scratch, but so far I haven't found something I can bet the company's fate on. The current system is built on Windows and though the business owner is open to think about a move to Linux, it will have to be a gradual. Thanks, --Amos = To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
Re: RAID5/ext3 trouble...
On 18/03/07, Dan Bar Dov <[EMAIL PROTECTED]> wrote: That's a good question to which I have no answer. I don't know how google do it. I can think of 1. special file system 2. some kind of "scrubber" - a daemon scanning for FS changes and copying whatever changed 3. use a sync tool (rsync?) on adaily (hourly?) basis I doubt google has 1. This is some good startup material. Actually they have (1): [1] and they relay a lot on it. But apart from describing it in this paper (and mentioning it in other publications, e.g. [2] is the latest I read, among tons of others) they don't provide the actual code, which I suppose I understand since it could be viewed as part of their core business ("organizing the world's information"). Another simple idea I just had overnight - cross-mirror disks among a couple of nodes (preferably on different racks, when we reach such a stage) either as database replication or through DRBD-style disk replication (sort of "RAID 0 over net"), then if one of the nodes goes down its partner can take over handling of its queue of jobs. If the rest of the system is architected to use local data as much as possible (i.e. all the stages which process the same piece of info run on the same node) this might be enough to achieve both redundancy and reliability, while still minimizing network traffic (second conclusion in [2])) since the mirroring over the net can be done asynchronously - losing a transaction during a node's crash should have negligible effect. [1] http://labs.google.com/papers/gfs.html [2] http://labs.google.com/papers/mapreduce.html --Amos