On Tue, 2009-11-10 at 12:10 -0500, Erik Garrison wrote:
> I recently encountered JFS filesystem corruption on a system which
> suffered memory corruption.  I was unaware of this failure until
> fsck's of the filesystem in question failed and caused system
> instability.  I ran memtest86+ and discovered that several bits of ram
> had failed.  The error messages from the first fsck failure were
> roughtly "duplicate filehandle", but I don't have logs so I can't
> provide an exact report.
> 
> I removed the bad ram and began efforts to recover the system.  I then
> booted the system using an Ubuntu Karmic live CD and tried to back up
> the data via a simple cp -a <src> <dest>.  This failed upon reaching
> one of the corrupted files, and additionally left the target (also
> JFS) filesystem damaged.  I had to reformat the target filesystem and
> try again.
> 
> I was able to use ls -Rl to discover which files had been lost.  But
> it provided very cryptic error messages:
> 
> .....
> ls: cannot access ./lib/firmware/2.6.28-3-rt/emi62: Stale NFS file handle
> ls: cannot access ./lib/firmware/2.6.28-3-rt/korg: Stale NFS file handle
> ls: cannot access ./lib/firmware/2.6.28-3-rt/sun: Stale NFS file handle
> ls: cannot access
> ./lib/linux-restricted-modules/2.6.28-15-generic/wlan: Stale NFS file
> handle
> ls: cannot access
> ./lib/modules/2.6.27-14-generic/kernel/drivers/input/joystick/interact.ko:
> Stale NFS file handle
> ls: cannot access
> ./lib/modules/2.6.27-14-generic/kernel/drivers/input/joystick/joydump.ko:
> Stale NFS file handle
> ls: cannot access
> ./lib/modules/2.6.27-14-generic/kernel/drivers/input/joystick/magellan.ko:
> Stale NFS file handle
> ls: cannot access
> ./lib/modules/2.6.27-14-generic/kernel/drivers/input/joystick/sidewinder.ko:
> Stale NFS file handle
> .....
> 
> To my knowledge none of these files were ever mounted via NFS.

JFS is incorrectly return the return code -ESTALE in some situations
where the metadata isn't what is expected.  That error code should
really only be used by nfs or interfaces used by nfs.  It's a bug, but
nothing really serious, since the fix would be simply to return another
error which is less cryptic.

> I used this list of failures as an exclusion list for rsync and was
> then able to save almost everything from the ro-mounted filesystem.
> 
> I am waiting to reformat the partition because I wanted to file a bug
> report to document the case.  Are there any recommendations as to what
> data I should try to save from the disk and how I should find it?  I
> want to complete this step quickly as I need to use the system for
> work purposes.

Thanks for reporting the bug.  There really isn't a need to preserve the
damaged file system.  The bugs in the code can easily be found by
grepping for ESTALE.

Thanks,
Shaggy
-- 
David Kleikamp
IBM Linux Technology Center


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion

Reply via email to