I didn't catch any errors in syslog.
I'm really should have gotten that strace. I was sick when I
finalized the recovery so perhaps this is why it slipped my mind.
Is there any way that the system could both copy the bad (corrupting)
data and still raise the error? If the ESTALE error isn't handled
properly up the stack then perhaps the corrupted inodes could be
copied.
This seems to have been a problem for ext4:
e6f009b0b45220c004672d41a58865e94946104d
ext4: return -EIO not -ESTALE on directory traversal through deleted inode
For convenience here's the full commit:
commit e6f009b0b45220c004672d41a58865e94946104d
Author: Bryan Donlan <[email protected]>
Date: Sun Feb 22 21:20:25 2009 -0500
ext4: return -EIO not -ESTALE on directory traversal through deleted inode
ext4_iget() returns -ESTALE if invoked on a deleted inode, in order to
report errors to NFS properly. However, in ext4_lookup(), this
-ESTALE can be propagated to userspace if the filesystem is corrupted
such that a directory entry references a deleted inode. This leads to
a misleading error message - "Stale NFS file handle" - and confusion
on the part of the admin.
The bug can be easily reproduced by creating a new filesystem, making
a link to an unused inode using debugfs, then mounting and attempting
to ls -l said link.
This patch thus changes ext4_lookup to return -EIO if it receives
-ESTALE from ext4_iget(), as ext4 does for other filesystem metadata
corruption; and also invokes the appropriate ext*_error functions when
this case is detected.
I have adapted this patch to the jfs case
diff --git a/fs/jfs/namei.c b/fs/jfs/namei.c
index c79a427..0bbd489 100644
--- a/fs/jfs/namei.c
+++ b/fs/jfs/namei.c
@@ -1471,9 +1471,15 @@ static struct dentry *jfs_lookup(struct inode
*dip, struct dentry *dentry,
}
ip = jfs_iget(dip->i_sb, inum);
- if (IS_ERR(ip)) {
- jfs_err("jfs_lookup: iget failed on inum %d", (uint) inum);
- return ERR_CAST(ip);
+ if (unlikely(IS_ERR(ip))) {
+ if (PTR_ERR(ip) == -ESTALE) {
+ jfs_err("deleted inode referenced: %u",
+ inum);
+ return ERR_PTR(-EIO);
+ } else {
+ jfs_err("jfs_lookup: iget failed on inum %d", (uint) inum);
+ return ERR_CAST(ip);
+ }
}
dentry = d_splice_alias(ip, dentry);
Testing will require: "creating a new filesystem, making a link to
an unused inode using debugfs, then mounting and attempting to ls -l
said link."
Where should I submit the patch after I've tested it?
Erik
On Wed, Nov 11, 2009 at 8:13 PM, Dave Kleikamp
<[email protected]> wrote:
> On Wed, 2009-11-11 at 22:11 +0100, Andi Kleen wrote:
>> Erik Garrison <[email protected]> writes:
>> >
>> > I removed the bad ram and began efforts to recover the system. I then
>> > booted the system using an Ubuntu Karmic live CD and tried to back up
>> > the data via a simple cp -a <src> <dest>. This failed upon reaching
>> > one of the corrupted files, and additionally left the target (also
>> > JFS) filesystem damaged. I had to reformat the target filesystem and
>> > try again.
>>
>> Leaving the target damaged too when another file system threw an error
>> sounds like a serious bug. Are you sure the new hardware was good?
>
> I had skimmed over this too quick and missed that. Yeah. That
> shouldn't happen. Were there any I/O errors in the syslog?
>
> Thanks,
> Shaggy
> --
> David Kleikamp
> IBM Linux Technology Center
>
>
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion