Re: raid5: I lost a XFS file system due to a minor IDE cable problem
On Monday 28 May 2007 04:17:18 David Chinner wrote: On Mon, May 28, 2007 at 03:50:17AM +0200, Pallai Roland wrote: On Monday 28 May 2007 02:30:11 David Chinner wrote: On Fri, May 25, 2007 at 04:35:36PM +0200, Pallai Roland wrote: .and I've spammed such messages. This internal error isn't a good reason to shut down the file system? Actaully, that error does shut the filesystem down in most cases. When you see that output, the function is returning -EFSCORRUPTED. You've got a corrupted freespace btree. The reason why you get spammed is that this is happening during background writeback, and there is no one to return the -EFSCORRUPTED error to. The background writeback path doesn't specifically detect shut down filesystems or trigger shutdowns on errors because that happens in different layers so you just end up with failed data writes. These errors will occur on the next foreground data or metadata allocation and that will shut the filesystem down at that point. I'm not sure that we should be ignoring EFSCORRUPTED errors here; maybe in this case we should be shutting down the filesystem. That would certainly cut down on the spamming and would not appear to change anything other behaviour If I remember correctly, my file system wasn't shutted down at all, it was writeable for whole night, the yafc slowly written files to it. Maybe all write operations had failed, but yafc doesn't warn. So you never created new files or directories, unlinked files or directories, did synchronous writes, etc? Just had slowly growing files? I just overwritten badly downloaded files. Spamming is just annoying when we need to find out what went wrong (My kernel.log is 300Mb), but for data security it's important to react to EFSCORRUPTED error in any case, I think so. Please consider this. The filesystem has responded correctly to the corruption in terms of data security (i.e. failed the data write and warned noisily about it), but it probably hasn't done everything it should H. A quick look at the linux code makes me thikn that background writeback on linux has never been able to cause a shutdown in this case. However, the same error on Irix will definitely cause a shutdown, though I hope Linux will follow Irix, that's a consistent standpoint. David, have you a plan to implement your reporting raid5 block layer idea? No one else has caring about this silent data loss on temporary (cable, power) failed raid5 arrays as I see, I really hope you do at least! -- d - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.
Neil Brown writes: [...] Thus the general sequence might be: a/ issue all preceding writes. b/ issue the commit write with BIO_RW_BARRIER c/ wait for the commit to complete. If it was successful - done. If it failed other than with EOPNOTSUPP, abort else continue d/ wait for all 'preceding writes' to complete e/ call blkdev_issue_flush f/ issue commit write without BIO_RW_BARRIER g/ wait for commit write to complete if it failed, abort h/ call blkdev_issue DONE steps b and c can be left out if it is known that the device does not support barriers. The only way to discover this to try and see if it fails. I don't think any filesystem follows all these steps. It seems that steps b/ -- h/ are quite generic, and can be implemented once in a generic code (with some synchronization mechanism like wait-queue at d/). Nikita. [...] Thank you for your attention. NeilBrown Nikita. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5: I lost a XFS file system due to a minor IDE cable problem
On Friday 25 May 2007 02:05:47 David Chinner wrote: -o ro,norecovery will allow you to mount the filesystem and get any uncorrupted data off it. You still may get shutdowns if you trip across corrupted metadata in the filesystem, though. This filesystem is completely dead. hq:~# mount -o ro,norecovery /dev/loop1 /mnt/r5 May 28 13:41:50 hq kernel: Mounting filesystem loop1 in no-recovery mode. Filesystem will be inconsistent. May 28 13:41:50 hq kernel: XFS: failed to read root inode hq:~# xfs_db /dev/loop1 xfs_db: cannot read root inode (22) xfs_db: cannot read realtime bitmap inode (22) Segmentation fault hq:~# strace xfs_db /dev/loop1 _llseek(4, 0, [0], SEEK_SET)= 0 read(4, XFSB\0\0\20\0\0\0\0\0\6\374\253\0\0\0\0\0\0\0\0\0\0\0\0..., 512) = 512 pread(4, \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0..., 512, 480141901312) = 512 pread(4, \30G$L\203\33OE [EMAIL PROTECTED]\324\2074DY\323\6..., 8192, 131072) = 8192 write(2, xfs_db: cannot read root inode (..., 36xfs_db: cannot read root inode (22) ) = 36 pread(4, \30G$L\203\33OE [EMAIL PROTECTED]\324\2074DY\323\6..., 8192, 131072) = 8192 write(2, xfs_db: cannot read realtime bit..., 47xfs_db: cannot read realtime bitmap inode (22) ) = 47 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ Browsing with hexdump -C, seems like a part of a PDF file is at 128Kb, on the place of the root inode. :( -- d - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid state diagram
Tomka Gergely wrote: Hi! I am drawing this picture for teaching: http://gergely.tomka.hu/kep/raidstates.png Is this a correct picture? I am not sure in the difference between active/clean and resync/recover. thanks for any comment. Looks good to me, perhaps a copyright and creative commons license statement would be useful to clarify what is fair use. That allows others to use it for free projects, but you would still control commercial use. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5: I lost a XFS file system due to a minor IDE cable problem
On Mon, May 28, 2007 at 01:17:31PM +0200, Pallai Roland wrote: On Monday 28 May 2007 04:17:18 David Chinner wrote: H. A quick look at the linux code makes me thikn that background writeback on linux has never been able to cause a shutdown in this case. However, the same error on Irix will definitely cause a shutdown, though I hope Linux will follow Irix, that's a consistent standpoint. I raised a bug for this yesterday when writing that reply. It won't get forgotten now David, have you a plan to implement your reporting raid5 block layer idea? No one else has caring about this silent data loss on temporary (cable, power) failed raid5 arrays as I see, I really hope you do at least! Yeah, I'd love to get something like this happening, but given it's about half way down my list of stuff to do when I have some spare time I'd say it will be about 2015 before I get to it. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5: I lost a XFS file system due to a minor IDE cable problem
On Mon, May 28, 2007 at 05:30:52PM +0200, Pallai Roland wrote: On Monday 28 May 2007 14:53:55 Pallai Roland wrote: On Friday 25 May 2007 02:05:47 David Chinner wrote: -o ro,norecovery will allow you to mount the filesystem and get any uncorrupted data off it. You still may get shutdowns if you trip across corrupted metadata in the filesystem, though. This filesystem is completely dead. [...] I tried to make a md patch to stop writes if a raid5 array got 2+ failed drives, but I found it's already done, oops. :) handle_stripe5() ignores writes in this case quietly, I tried and works. Hmmm - it clears the uptodate bit on the bio, which is supposed to make the bio return EIO. That looks to be doing the right thing... There's an another layer I used on this box between md and xfs: loop-aes. I Oh, that's a kind of important thing to forget to mention used it since years and rock stable, but now it's my first suspect, cause I found a bug in it today: I assembled my array from n-1 disks, and I failed a second disk for a test and I found /dev/loop1 still provides *random* data where /dev/md1 serves nothing, it's definitely a loop-aes bug: . It's not an explanation to my screwed up file system, but for me it's enough to drop loop-aes. Eh. If you can get random data back instead of an error from the block device, then I'm not surprised your filesystem is toast. If it's one sector in a larger block that is corrupted, then the only thing that will protect you from this sort of corruption causing problems is metadata checksums (yet another thin on my list of stuff to do). Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5: I lost a XFS file system due to a minor IDE cable problem
On Mon, May 28, 2007 at 05:45:27PM -0500, Alberto Alonso wrote: On Fri, 2007-05-25 at 18:36 +1000, David Chinner wrote: On Fri, May 25, 2007 at 12:43:51AM -0500, Alberto Alonso wrote: I think his point was that going into a read only mode causes a less catastrophic situation (ie. a web server can still serve pages). Sure - but once you've detected one corruption or had metadata I/O errors, can you trust the rest of the filesystem? I think that is a valid point, rather than shutting down the file system completely, an automatic switch to where the least disruption of service can occur is always desired. I consider the possibility of serving out bad data (i.e after a remount to readonly) to be the worst possible disruption of service that can happen ;) I guess it does depend on the nature of the failure. A write failure on block 2000 does not imply corruption of the other 2TB of data. The rest might not be corrupted, but if block 2000 is a index of some sort (i.e. metadata), you could reference any of that 2TB incorrectly and get the wrong data, write to the wrong spot on disk, etc. I personally have found the XFS file system to be great for my needs (except issues with NFS interaction, where the bug report never got answered), but that doesn't mean it can not be improved. Got a pointer? I can't seem to find it. I'm pretty sure I used bugzilla to report it. I did find the kernel dump file though, so here it is: Oct 3 15:34:07 localhost kernel: xfs_iget_core: ambiguous vns: vp/0xd1e69c80, invp/0xc989e380 Oh, I haven't seen any of those problems for quite some time. = /proc/kmsg started. Oct 3 15:51:23 localhost kernel: Inspecting /boot/System.map-2.6.8-2-686-smp Oh, well, yes, kernels that old did have that problem. It got fixed some time around 2.6.12 or 2.6.13 IIRC Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html