One of my reiserfs disks became corrupted last week, and it's still causing
me problems. I'll try to describe it in full detail, hoping that this problem
can be fixed for good.

The disk in question is a Linux software raid 0 partition on 2x40GB Maxtor
IDE100 drives on a Promise FastTrak100 controller (which has raid support,
but I use Linux software raid instead).
When this started, my kernel was 2.4.8-ac9, and the machine had an uptime of
about 1 month running this kernel without problems.

I wanted to clean up a bit, then ls started to act weird -- it could list the
file names in my directories, but it failed to stat most of the files (I run
ls -F --color).

In my syslog I got:

Sep 25 16:06:13 monsterbob kernel: hdg: timeout waiting for DMA
Sep 25 16:06:13 monsterbob kernel: ide_dmaproc: chipset supported
ide_dma_timeout func only: 14
Sep 25 16:06:13 monsterbob kernel: hdg: status timeout: status=0x80 { Busy }
Sep 25 16:06:13 monsterbob kernel: hdg: drive not ready for command
Sep 25 16:06:15 monsterbob kernel: ide3: reset: success
Sep 25 16:06:20 monsterbob kernel: is_tree_node: node level 0 does not match
to
the expected one 1
Sep 25 16:06:20 monsterbob kernel: vs-5150: search_by_key: invalid format
found
in block 8801. Fsck?
Sep 25 16:06:20 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o
failure occurred trying to find stat data of [2091 2092 0x0 SD]
Sep 25 16:09:43 monsterbob kernel: is_tree_node: node level 0 does not match
to
the expected one 1
Sep 25 16:09:43 monsterbob kernel: vs-5150: search_by_key: invalid format
found
in block 11746. Fsck?
Sep 25 16:09:43 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o
failure occurred trying to find stat data of [3 2100 0x0 SD]
Sep 25 16:09:43 monsterbob kernel: is_leaf: free space seems wrong: level=1,
nr_items=1, free_space=0 rdkey
Sep 25 16:09:43 monsterbob kernel: vs-5150: search_by_key: invalid format
found
in block 11749. Fsck?
Sep 25 16:09:43 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o
failure occurred trying to find stat data of [3 2101 0x0 SD]
[etc...]

>From what I can see, there was first a problem because my disks were sleeping
and they didn't spin up fast enough. Perhaps hdg was removed from my striped
raid or something, which confused reiserfs a lot.

I unmounted the partition, hoping that it would work when I remounted it, but
it failed:

[root@monsterbob root]# mount /mnt/disk
mount: Not a directory

In my syslog I got:

reiserfs: checking transaction log (device 09:00) ...
is_tree_node: node level 6425 does not match to the expected one 4
vs-5150: search_by_key: invalid format found in block 150545. Fsck?...
vs-13040: reiserfs_read_inode2: i/o failure occurred trying to find stat data
of [1 2 0x0 SD]
Using r5 hash to sort names
is_tree_node: node level 6425 does not match to the expected one 4
vs-5150: search_by_key: invalid format found in block 150545. Fsck?
vs-2140: finish_unfinished: search_by_key returned -2
ReiserFS version 3.6.25

I upgraded my kernel to 2.4.9-ac14, then I did reiserfsck with
reiserfsprogs-3.x.0j, but it segfaulted. reiserfsprogs-3.x.0k-pre10 worked,
so I did reiserfsck --rebuild-tree /dev/md0
and this fixed it. The disk worked for a few hours, then exactly the same
thing happened while the disks were spinning up.
While writing this, I'm doing rebuild-tree again, but it seems that this
"cure" doesn't last very long.

It seems to me that I have a problem with my IDE somewhere below reiserfs
that needs to be worked out. However, it still seems to be a bug in reiserfs
that corrupts my filesystem when it gets confused, instead of just giving up
so it would work the next time I remounted the partition.

Hoping this can be solved,
Jonas Jensen

PS: please CC me as I don't subscribe to this list.

Reply via email to