Re: raid5: I lost a XFS file system due to a minor IDE cable problem

2007-05-28 Thread Pallai Roland

On Monday 28 May 2007 04:17:18 David Chinner wrote:
 On Mon, May 28, 2007 at 03:50:17AM +0200, Pallai Roland wrote:
  On Monday 28 May 2007 02:30:11 David Chinner wrote:
   On Fri, May 25, 2007 at 04:35:36PM +0200, Pallai Roland wrote:
.and I've spammed such messages. This internal error isn't a good
reason to shut down the file system?
  
   Actaully, that error does shut the filesystem down in most cases. When
   you see that output, the function is returning -EFSCORRUPTED. You've
   got a corrupted freespace btree.
  
   The reason why you get spammed is that this is happening during
   background writeback, and there is no one to return the -EFSCORRUPTED
   error to. The background writeback path doesn't specifically detect
   shut down filesystems or trigger shutdowns on errors because that
   happens in different layers so you just end up with failed data writes.
   These errors will occur on the next foreground data or metadata
   allocation and that will shut the filesystem down at that point.
  
   I'm not sure that we should be ignoring EFSCORRUPTED errors here; maybe
   in this case we should be shutting down the filesystem.  That would
   certainly cut down on the spamming and would not appear to change
   anything other behaviour
 
   If I remember correctly, my file system wasn't shutted down at all, it
  was writeable for whole night, the yafc slowly written files to it.
  Maybe all write operations had failed, but yafc doesn't warn.

 So you never created new files or directories, unlinked files or
 directories, did synchronous writes, etc? Just had slowly growing files?
 I just overwritten badly downloaded files.

   Spamming is just annoying when we need to find out what went wrong (My
  kernel.log is 300Mb), but for data security it's important to react to
  EFSCORRUPTED error in any case, I think so. Please consider this.

 The filesystem has responded correctly to the corruption in terms of
 data security (i.e. failed the data write and warned noisily about
 it), but it probably hasn't done everything it should

 H. A quick look at the linux code makes me thikn that background
 writeback on linux has never been able to cause a shutdown in this
 case. However, the same error on Irix will definitely cause a
 shutdown, though
 I hope Linux will follow Irix, that's a consistent standpoint.


 David, have you a plan to implement your reporting raid5 block layer idea? 
No one else has caring about this silent data loss on temporary (cable, 
power) failed raid5 arrays as I see, I really hope you do at least!


--
 d

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] BIO_RW_BARRIER - what it means for devices, filesystems, and dm/md.

2007-05-28 Thread Nikita Danilov
Neil Brown writes:
  

[...]

  Thus the general sequence might be:
  
a/ issue all preceding writes.
b/ issue the commit write with BIO_RW_BARRIER
c/ wait for the commit to complete.
   If it was successful - done.
   If it failed other than with EOPNOTSUPP, abort
   else continue
d/ wait for all 'preceding writes' to complete
e/ call blkdev_issue_flush
f/ issue commit write without BIO_RW_BARRIER
g/ wait for commit write to complete
 if it failed, abort
h/ call blkdev_issue
DONE
  
  steps b and c can be left out if it is known that the device does not
  support barriers.  The only way to discover this to try and see if it
  fails.
  
  I don't think any filesystem follows all these steps.

It seems that steps b/ -- h/ are quite generic, and can be implemented
once in a generic code (with some synchronization mechanism like
wait-queue at d/).

Nikita.

[...]

  
  Thank you for your attention.
  
  NeilBrown
  

Nikita.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5: I lost a XFS file system due to a minor IDE cable problem

2007-05-28 Thread Pallai Roland

On Friday 25 May 2007 02:05:47 David Chinner wrote:
 -o ro,norecovery will allow you to mount the filesystem and get any
 uncorrupted data off it.

 You still may get shutdowns if you trip across corrupted metadata in
 the filesystem, though.
This filesystem is completely dead.

hq:~# mount -o ro,norecovery /dev/loop1 /mnt/r5
May 28 13:41:50 hq kernel: Mounting filesystem loop1 in no-recovery mode.  
Filesystem will be inconsistent.
May 28 13:41:50 hq kernel: XFS: failed to read root inode

hq:~# xfs_db /dev/loop1
xfs_db: cannot read root inode (22)
xfs_db: cannot read realtime bitmap inode (22)
Segmentation fault

hq:~# strace xfs_db /dev/loop1
_llseek(4, 0, [0], SEEK_SET)= 0
read(4, XFSB\0\0\20\0\0\0\0\0\6\374\253\0\0\0\0\0\0\0\0\0\0\0\0..., 512) = 
512
pread(4, \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0..., 512, 
480141901312) = 512
pread(4, \30G$L\203\33OE [EMAIL PROTECTED]\324\2074DY\323\6..., 8192, 
131072) = 8192
write(2, xfs_db: cannot read root inode (..., 36xfs_db: cannot read root 
inode (22)
) = 36
pread(4, \30G$L\203\33OE [EMAIL PROTECTED]\324\2074DY\323\6..., 8192, 
131072) = 8192
write(2, xfs_db: cannot read realtime bit..., 47xfs_db: cannot read realtime 
bitmap inode (22)
) = 47
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++


Browsing with hexdump -C, seems like a part of a PDF file is at 128Kb, on the 
place of the root inode. :(


--
 d

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid state diagram

2007-05-28 Thread Bill Davidsen

Tomka Gergely wrote:

Hi!

I am drawing this picture for teaching:

http://gergely.tomka.hu/kep/raidstates.png

Is this a correct picture? I am not sure in the difference between 
active/clean and resync/recover.


thanks for any comment.


Looks good to me, perhaps a copyright and creative commons license 
statement would be useful to clarify what is fair use. That allows 
others to use it for free projects, but you would still control 
commercial use.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5: I lost a XFS file system due to a minor IDE cable problem

2007-05-28 Thread David Chinner
On Mon, May 28, 2007 at 01:17:31PM +0200, Pallai Roland wrote:
 On Monday 28 May 2007 04:17:18 David Chinner wrote:
  H. A quick look at the linux code makes me thikn that background
  writeback on linux has never been able to cause a shutdown in this case.
  However, the same error on Irix will definitely cause a shutdown,
  though
  I hope Linux will follow Irix, that's a consistent standpoint.

I raised a bug for this yesterday when writing that reply. It won't
get forgotten now

  David, have you a plan to implement your reporting raid5 block layer
  idea?  No one else has caring about this silent data loss on temporary
  (cable, power) failed raid5 arrays as I see, I really hope you do at least!

Yeah, I'd love to get something like this happening, but given it's about
half way down my list of stuff to do when I have some spare time I'd
say it will be about 2015 before I get to it.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5: I lost a XFS file system due to a minor IDE cable problem

2007-05-28 Thread David Chinner
On Mon, May 28, 2007 at 05:30:52PM +0200, Pallai Roland wrote:
 
 On Monday 28 May 2007 14:53:55 Pallai Roland wrote:
  On Friday 25 May 2007 02:05:47 David Chinner wrote:
   -o ro,norecovery will allow you to mount the filesystem and get any
   uncorrupted data off it.
  
   You still may get shutdowns if you trip across corrupted metadata in
   the filesystem, though.
 
  This filesystem is completely dead.
  [...]
 
  I tried to make a md patch to stop writes if a raid5 array got 2+ failed 
 drives, but I found it's already done, oops. :) handle_stripe5() ignores 
 writes in this case quietly, I tried and works.

Hmmm - it clears the uptodate bit on the bio, which is supposed to
make the bio return EIO. That looks to be doing the right thing...

  There's an another layer I used on this box between md and xfs: loop-aes. I 

Oh, that's a kind of important thing to forget to mention

 used it since years and rock stable, but now it's my first suspect, cause I 
 found a bug in it today:
  I assembled my array from n-1 disks, and I failed a second disk for a test 
 and I found /dev/loop1 still provides *random* data where /dev/md1 serves 
 nothing, it's definitely a loop-aes bug:

.

  It's not an explanation to my screwed up file system, but for me it's enough 
 to drop loop-aes. Eh.

If you can get random data back instead of an error from the block device,
then I'm not surprised your filesystem is toast. If it's one sector in a
larger block that is corrupted, then the only thing that will protect you from
this sort of corruption causing problems is metadata checksums (yet another
thin on my list of stuff to do).

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5: I lost a XFS file system due to a minor IDE cable problem

2007-05-28 Thread David Chinner
On Mon, May 28, 2007 at 05:45:27PM -0500, Alberto Alonso wrote:
 On Fri, 2007-05-25 at 18:36 +1000, David Chinner wrote:
  On Fri, May 25, 2007 at 12:43:51AM -0500, Alberto Alonso wrote:
   I think his point was that going into a read only mode causes a
   less catastrophic situation (ie. a web server can still serve
   pages).
  
  Sure - but once you've detected one corruption or had metadata
  I/O errors, can you trust the rest of the filesystem?
  
   I think that is a valid point, rather than shutting down
   the file system completely, an automatic switch to where the least
   disruption of service can occur is always desired.
  
  I consider the possibility of serving out bad data (i.e after
  a remount to readonly) to be the worst possible disruption of
  service that can happen ;)
 
 I guess it does depend on the nature of the failure. A write failure
 on block 2000 does not imply corruption of the other 2TB of data.

The rest might not be corrupted, but if block 2000 is a index of
some sort (i.e. metadata), you could reference any of that 2TB
incorrectly and get the wrong data, write to the wrong spot on disk,
etc.

   I personally have found the XFS file system to be great for
   my needs (except issues with NFS interaction, where the bug report
   never got answered), but that doesn't mean it can not be improved.
  
  Got a pointer?
 
 I can't seem to find it. I'm pretty sure I used bugzilla to report
 it. I did find the kernel dump file though, so here it is:
 
 Oct  3 15:34:07 localhost kernel: xfs_iget_core: ambiguous vns:
 vp/0xd1e69c80, invp/0xc989e380

Oh, I haven't seen any of those problems for quite some time.

 = /proc/kmsg started.
 Oct  3 15:51:23 localhost kernel:
 Inspecting /boot/System.map-2.6.8-2-686-smp

Oh, well, yes, kernels that old did have that problem. It got fixed
some time around 2.6.12 or 2.6.13 IIRC

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html