RE: new features time-line
On Friday October 13, [EMAIL PROTECTED] wrote: > Good to hear. > > I think when I first built my RAID (a few years ago) I did some research on > this; > http://www.google.com/search?hl=en&q=bad+block+replacement+capabilities+mdad > m > > And found stories where bit errors were an issue. > http://www.ogre.com/tiki-read_article.php?articleId=7 > > After your email, I went out and researched it again. Eleven months ago a > patch to address this was submitted for RAID5, I would assume RAID6 > benefited from it too? Yes. All appropriate raid level support auto-overwrite of read errors. NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ANNOUNCE: mdadm 2.5.4 - A tool for managing Soft RAID under Linux
On Friday October 13, [EMAIL PROTECTED] wrote: > On Fri, Oct 13, 2006 at 10:15:35AM +1000, Neil Brown wrote: > > > >I am pleased to announce the availability of > > mdadm version 2.5.4 > > it looks like you did not include the patches i posted against 2.5.3 > No... sorry about that. They are all in the git tree now. Thanks, NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: possible deadlock through raid5/md
While travelling the last few days, a theory has occurred to me to explain this sort of thing ... > A user has sent me a ps ax output showing an enbd client daemon > blocked in get_active_stripe (I presume in raid5.c). > > ps ax -of,uid,pid,ppid,pri,ni,vsz,rss,wchan:30,stat,tty,time,command > > F UID PID PPID PRI NI VSZ RSS WCHAN STAT TT TIME COMMAND > 5 0 26540 1 23 0 2140 1048 get_active_stripe Ds ? 00:00:00 > enbd-client iss04 1300 -i iss04-hdd -n 2 -e -m -b 4096 -p 30 /dev/ndl Suppose that memory is full of dirty buffers and that the _transport_ for the medium on which one of the raid disks is running (in this case tcp, under enbd and elsewhere) needs buffers. It needs buffers both to read and write. But there are none available so the call through the user process which wants to use the transport causes the kernel to try and free pages. That causes the user process to end up in the kernel routines which try and flush devices to disk, and through them in the various (request?) functions of device drivers, and perhaps even in raid5's get_active_stripe. However, if that stripe is on a remote disk availale through tcp, then tcp is blocked by lack of the resources that are trying to be freed, so we are in deadlock? Sound plausible? Cure ought to be to keep some kernel memory available for tcp that is not available to dirty buffers. Peter - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Multiple Disk Failure Recovery
On Sat, Oct 14, 2006 at 11:35:31PM -0700, David Rees wrote: > On 10/14/06, Lane Brooks <[EMAIL PROTECTED]> wrote: > >functioning. Right now I cannot get a spare disk recovery to finish > >because these bad sectors. Is there a way to force as much recovery as > >possible so that I can replace this newly faulty drive? > > One technique is to use ddrescue to create an image of the failing > drive(s) (I would image all drives if possible) and use those images > to try to retrieve your data. I'm currently writing error supression md layer precisely for situations like this to remove the need for the dd step. Obvisouly it doesn't recover the data in the bad sectors or masked by errors but the aim is to reduce to need of doing multiple copies which recoversy procdure tend to invovle. TTFN -- Roger. Home| http://www.sandman.uklinux.net/ Master of Peng Shui. (Ancient oriental art of Penguin Arranging) Work|Independent Sys Consultant | http://www.computer-surgery.co.uk/ So what are the eigenvalues and eigenvectors of 'The Matrix'? --anon signature.asc Description: Digital signature
FW: Multiple Disk Failure Recovery
One could remove the spare drive from the system. Than do the mdadm --assemble --force to get it start and keep it from trying to resync/recover. Once you get the array up and 'limping' carefully pick the most important stuff and copy it off the array and hope the bad sectors don't did not affect that data. As you mentioned, you have bad sectors. If you try to do it all or even as you pick the important stuff (I know it is all important that why it was on the RAID), you will eventually hit data that has bad sectors and mdadm will fail the effected drive and deactivate the array. At that point accept that the data in that area is most likely gone. Than do the mdadm --assemble --force to get it started again and move on to the next areas of most important data. Could be a long cycle... Aside from that, I am curious, was your spare disk shared by another array. If it was not than I would recommend you don't do a RAID5 with hot spare next time, and do a RAID6. But this is my person feeling and you can take it or leave it. I too at one point did a RAID5 with hot spare and I was using eight drives. So yes you can have two drives fall as long as the delta between failures is long enough to allow the raid to resync the spare in and during the process there are no be unknown bad sectors on the remaining drives. But I got to thinking, if I am going to be spinning/powering that "hot spare" I may as well as do a RAID6. As long as the hot spare is not share with other arrays, I see no downside and it would protect you in the future from this problem. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Lane Brooks Sent: Saturday, October 14, 2006 9:29 PM To: linux-raid@vger.kernel.org Subject: Multiple Disk Failure Recovery I have a RAID 5 setup with one spare disk. One of the disks failed, so I replaced it, and while it was recovering, it found some bad sectors on another drive (unreadable and uncorrectable SMART errors). This generated a fail event and shut down the array. When I restart the array with the force command, it starts the recovery process and dies when it gets to the bad sectors, so I am stuck in an infinite loop. I am wondering if there is a way to cut my losses with these bad sectors and have it recover what it can so that I can get my raid array back to functioning. Right now I cannot get a spare disk recovery to finish because these bad sectors. Is there a way to force as much recovery as possible so that I can replace this newly faulty drive? Thanks Lane - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html