Hi, This is a followup for Debian bug <http://bugs.debian.org/292290>.
Joost van Baal <[EMAIL PROTECTED]> - Wed, Jan 26, 2005: > `./lib/modules/2.6.10-1-k7/kernel/drivers/atm/zatm.ko': Unknown error 990 > I've heard of one other victim of this problem with this kernel. Wessel Dankers <[EMAIL PROTECTED]> - Thu, Jan 27, 2005: > I myself have been a victim of this too, so I thought I'd join in. Well, me too. > - the kernel was Debian's 2.6.8; > - the filesystem in question was XFS; > - software raid1 (mirroring) was used. > XFS complained about corrupted in-memory structures in some of the cases. > However, it is very unlikely that all three machines have bad RAM, and > memtest86+ reports no problems. I am also using Debian's kernel-image-2.6.8-2-686 in Version 2.6.8-13. First of all, I'm using a PIV, so this aint K7 specific. I am NOT using RAID 1 nor LVM, pure XFS. This first corruption appeared with my "mail/debian-project/" folder, precisely on the "tmp/" subdirectory. The second appeared today, on the ./usr/share/doc/texmf/help/Catalogue/entries/romannum.html: dpkg: error processing /var/cache/apt/archives/tetex-doc_2.0.2c-6_all.deb (--unpack): unable to stat `./usr/share/doc/texmf/help/Catalogue/entries/romannum.html' (which I was about to install): Unknown error 990 This is a really serious XFS problem it seems. Trying to understand the problem suggested I tried stracing: bee% LC_ALL=C strace -f ls debian-project-fucked/tmp 2>&1 ... rt_sigprocmask(SIG_UNBLOCK, [RTMIN], NULL, 8) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0 brk(0) = 0x805b000 brk(0x807c000) = 0x807c000 brk(0) = 0x807c000 ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(1, TIOCGWINSZ, {ws_row=24, ws_col=80, ws_xpixel=644, ws_ypixel=388}) = 0 stat64("debian-project-fucked/tmp", {st_mode=0, st_size=0, ...}) = -990 write(2, "ls: ", 4ls: ) = 4 write(2, "debian-project-fucked/tmp", 25debian-project-fucked/tmp) = 25 write(2, ": Unknown error 990", 19: Unknown error 990) = 19 write(2, "\n", 1 ) = 1 The problems seems to occur with the stat64() syscall, but I couldn't find out what error 990 is supposed to be in the /usr/include headers so I moved on to the kernel source and looked to the various syscalls implementations. I also tried understanding what syscalls could trigger the problem: I checked with: bee% LC_ALL=C strace zsh -e -c "cd debian-project-fucked/tmp; ls" and got the error with a chdir() too, and hence looked at sys_chdir(). Then I checked whether this was directory specific, and tried: bee% LC_ALL=C strace -f ls -i \ /usr/share/doc/texmf/help/Catalogue/entries/ 2>&1 I got errors on a bunch of files, in the lstat64(). Then I looked upstream, first at bugme.osdl.org, and found: http://bugme.osdl.org/show_bug.cgi?id=3224 (still open) Finally, I looked at SGI's bugzilla, and found a first bug bubble: http://oss.sgi.com/bugzilla/show_bug.cgi?id=197 The problem also seems to appear in a comment of: http://oss.sgi.com/bugzilla/show_bug.cgi?id=383 197 is really worth reading, and using MD / LVM devices seems to help trigger the bug. These are dups of the above: http://oss.sgi.com/bugzilla/show_bug.cgi?id=204 http://oss.sgi.com/bugzilla/show_bug.cgi?id=207 The final patch attached to the bug report is: http://oss.sgi.com/bugzilla/attachment.cgi?id=59&action=view I couldn't find an applied version in the kernel, it looked somehow too much different but the xfs_finish_reclaim_all() was there... 2.6.8 was released in august 2004, and the patch mentionned dates january 2003, so I can only think we face a different bug. Then I went thoroughly through the bugzilla and found another bug which might be related: http://oss.sgi.com/bugzilla/show_bug.cgi?id=338 is on a 2.4 kernel When I found out error 990 means EFSCORRUPTED, I thought I wouldn't be able to track down the problem any further... So I'm about to get a fresh xfsprogs or a live CD and xfs_repair my FS to get a log and send it upstream. Regards, -- Loïc Minier <[EMAIL PROTECTED]> "Neutral President: I have no strong feelings one way or the other."