Re: [reiserfs-list] corrupted reiserfs
I had the same problem. I wanted to debug it so i downloaded the source (debian unstable), compiled it with CFLAGS=-g and started it in gdb. Then it worked fine. The result of dpkg-buildpackage did not work, must be because it is stripped. So it could be a compiler issue or a timing problem. I was working on a dumped image on a new disk, and swapping RAM made no difference, so I do not think hardware problems was involved. -- Niels Elgaard Larsen
Re: [reiserfs-list] corrupted reiserfs
Hi Jonas Jensen (by way of Jonas Jensen ) (by way of Jonas Jensen ) wrote: One of my reiserfs disks became corrupted last week, and it's still causing me problems. I'll try to describe it in full detail, hoping that this problem can be fixed for good. The disk in question is a Linux software raid 0 partition on 2x40GB Maxtor IDE100 drives on a Promise FastTrak100 controller (which has raid support, but I use Linux software raid instead). When this started, my kernel was 2.4.8-ac9, and the machine had an uptime of about 1 month running this kernel without problems. I wanted to clean up a bit, then ls started to act weird -- it could list the file names in my directories, but it failed to stat most of the files (I run ls -F --color). In my syslog I got: Sep 25 16:06:13 monsterbob kernel: hdg: timeout waiting for DMA Sep 25 16:06:13 monsterbob kernel: ide_dmaproc: chipset supported ide_dma_timeout func only: 14 Sep 25 16:06:13 monsterbob kernel: hdg: status timeout: status=0x80 { Busy } Sep 25 16:06:13 monsterbob kernel: hdg: drive not ready for command Sep 25 16:06:15 monsterbob kernel: ide3: reset: success Sep 25 16:06:20 monsterbob kernel: is_tree_node: node level 0 does not match to the expected one 1 Sep 25 16:06:20 monsterbob kernel: vs-5150: search_by_key: invalid format found in block 8801. Fsck? Sep 25 16:06:20 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [2091 2092 0x0 SD] Sep 25 16:09:43 monsterbob kernel: is_tree_node: node level 0 does not match to the expected one 1 Sep 25 16:09:43 monsterbob kernel: vs-5150: search_by_key: invalid format found in block 11746. Fsck? Sep 25 16:09:43 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [3 2100 0x0 SD] Sep 25 16:09:43 monsterbob kernel: is_leaf: free space seems wrong: level=1, nr_items=1, free_space=0 rdkey Sep 25 16:09:43 monsterbob kernel: vs-5150: search_by_key: invalid format found in block 11749. Fsck? Sep 25 16:09:43 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [3 2101 0x0 SD] [etc...] From what I can see, there was first a problem because my disks were sleeping and they didn't spin up fast enough. Perhaps hdg was removed from my striped raid or something, which confused reiserfs a lot. I unmounted the partition, hoping that it would work when I remounted it, but it failed: [root@monsterbob root]# mount /mnt/disk mount: Not a directory In my syslog I got: reiserfs: checking transaction log (device 09:00) ... is_tree_node: node level 6425 does not match to the expected one 4 vs-5150: search_by_key: invalid format found in block 150545. Fsck?... vs-13040: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [1 2 0x0 SD] Using r5 hash to sort names is_tree_node: node level 6425 does not match to the expected one 4 vs-5150: search_by_key: invalid format found in block 150545. Fsck? vs-2140: finish_unfinished: search_by_key returned -2 ReiserFS version 3.6.25 I upgraded my kernel to 2.4.9-ac14, then I did reiserfsck with reiserfsprogs-3.x.0j, but it segfaulted. reiserfsprogs-3.x.0k-pre10 worked, so I did reiserfsck --rebuild-tree /dev/md0 and this fixed it. The disk worked for a few hours, then exactly the same thing happened while the disks were spinning up. While writing this, I'm doing rebuild-tree again, but it seems that this cure doesn't last very long. It seems to me that I have a problem with my IDE somewhere below reiserfs that needs to be worked out. However, it still seems to be a bug in reiserfs that corrupts my filesystem when it gets confused, instead of just giving up so it would work the next time I remounted the partition. IMHO, when hardware starts to fail - it is time to think about changing it. Reiserfs has not way to know when it should give up. It sends correct data to disk, broken hardware writes it wrong. Who did corrupt the data then? The worst thing in your case is (as it looks for me) that you do not have unreadable blocks in certain places but harddisk fails randomly. Anyway, next time your data will become available - you should find a way to backup then on reliable hardware. Thanks, vs Hoping this can be solved, Jonas Jensen PS: please CC me as I don't subscribe to this list.
Re: [reiserfs-list] corrupted reiserfs
On Monday 01 October 2001 12:01, you wrote: IMHO, when hardware starts to fail - it is time to think about changing it. Reiserfs has not way to know when it should give up. It sends correct data to disk, broken hardware writes it wrong. Who did corrupt the data then? The worst thing in your case is (as it looks for me) that you do not have unreadable blocks in certain places but harddisk fails randomly. You have a valid point. The thing is just that I can see from my syslog that reiserfs knew something was very wrong, yet it just kept going. Perhaps it should stop writing to the disk at that point (note that I didn't perform any write operations, just ls and cd (unless modifying atime counts as write, I think I forgot to mount with noatime :-( )). Anyway, next time your data will become available - you should find a way to backup then on reliable hardware. Here's my new problem: I can't get the data back this time. Can someone give me some good ideas on how to get my data back? I ran reiserfsck --rebuild-tree /dev/md0 and it segfaulted during pass 1 as shown here: 20%40%60%80%100% left 0, 545 /sec not set got 5146 hits r5 got 1154 hits Flushing..done Read blocks (but not data blocks) 18251667 Leaves among those 6791 - corrected leaves 5366 - leaves all contents of which could not be saved and deleted 3 pointers in indirect items to wrong area 1005357 (zeroed) Objectids found 804 Pass 1 (will try to insert 6788 leaves): ### Pass 1 ### Looking for allocable blocks .. ok 0%build_the_tree: nothing but leaves are expected. Block 8212 - ?? 6787, 0 /sec build_the_tree: nothing but leaves are expected. Block 8241 - ?? 6786, 0 /sec build_the_tree: nothing but leaves are expected. Block 8242 - ?? 6785, 0 /sec build_the_tree: nothing but leaves are expected. Block 8243 - ?? 6784, 0 /sec build_the_tree: nothing but leaves are expected. Block 8245 - ?? 6783, 0 /sec build_the_tree: nothing but leaves are expected. Block 8246 - ?? 6782, 0 /sec build_the_tree: nothing but leaves are expected. Block 8250 - ?? 6781, 0 /sec build_the_tree: nothing but leaves are expected. Block 8255 - ?? 6780, 0 /sec left 6779, 0 /sec mark_block_used: (2049648) used already Aborted (core dumped) When I try to mount the disk, it seems that it's completely trashed this time. [root@monsterbob fsck]# mount /mnt/disk mount: wrong fs type, bad option, bad superblock on /dev/md0, or too many mounted file systems Trying to rebuild-tree again, it coredumps: Pass 0: ### Pass 0 ### Loading on-disk bitmap .. ok, 18260601 blocks marked used bit 20021248, bitsize 20009824 reiserfsck: bitmap.c:134: reiserfs_bitmap_test_bit: Assertion `bit_number bm-bm_bit_size' failed. Aborted (core dumped) So I hope someone has an idea on how to save my data, or else I'll have to reformat :-(. Thanks, Jonas Jensen
Re: [reiserfs-list] corrupted reiserfs
On Monday 01 October 2001 19:26, you wrote: Yes, please find realiable harddrive, backup broken fielsystem there via dd if=/dev/filesystem-with-problem of=/dev/reliable-device bs=4096 conv=notrunc Then we will be able to recover your data. Please let us know when you have done with that data move. I can't get a 80GB disk... I might be able to get a 75GB one, but it would cost me. However I believe my hardware is reliable now that I've turned off DMA and sleep (From what I can see, the problem was caused by the combo of these options, as my syslog said timeout waiting for DMA, and I know the problem happened while the disks were waking up). The disks had been running in my Windows machine for many months and in my Linux for about 1 month without optimisations and without problems. /Jonas Jensen
[reiserfs-list] corrupted reiserfs
One of my reiserfs disks became corrupted last week, and it's still causing me problems. I'll try to describe it in full detail, hoping that this problem can be fixed for good. The disk in question is a Linux software raid 0 partition on 2x40GB Maxtor IDE100 drives on a Promise FastTrak100 controller (which has raid support, but I use Linux software raid instead). When this started, my kernel was 2.4.8-ac9, and the machine had an uptime of about 1 month running this kernel without problems. I wanted to clean up a bit, then ls started to act weird -- it could list the file names in my directories, but it failed to stat most of the files (I run ls -F --color). In my syslog I got: Sep 25 16:06:13 monsterbob kernel: hdg: timeout waiting for DMA Sep 25 16:06:13 monsterbob kernel: ide_dmaproc: chipset supported ide_dma_timeout func only: 14 Sep 25 16:06:13 monsterbob kernel: hdg: status timeout: status=0x80 { Busy } Sep 25 16:06:13 monsterbob kernel: hdg: drive not ready for command Sep 25 16:06:15 monsterbob kernel: ide3: reset: success Sep 25 16:06:20 monsterbob kernel: is_tree_node: node level 0 does not match to the expected one 1 Sep 25 16:06:20 monsterbob kernel: vs-5150: search_by_key: invalid format found in block 8801. Fsck? Sep 25 16:06:20 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [2091 2092 0x0 SD] Sep 25 16:09:43 monsterbob kernel: is_tree_node: node level 0 does not match to the expected one 1 Sep 25 16:09:43 monsterbob kernel: vs-5150: search_by_key: invalid format found in block 11746. Fsck? Sep 25 16:09:43 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [3 2100 0x0 SD] Sep 25 16:09:43 monsterbob kernel: is_leaf: free space seems wrong: level=1, nr_items=1, free_space=0 rdkey Sep 25 16:09:43 monsterbob kernel: vs-5150: search_by_key: invalid format found in block 11749. Fsck? Sep 25 16:09:43 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [3 2101 0x0 SD] [etc...] From what I can see, there was first a problem because my disks were sleeping and they didn't spin up fast enough. Perhaps hdg was removed from my striped raid or something, which confused reiserfs a lot. I unmounted the partition, hoping that it would work when I remounted it, but it failed: [root@monsterbob root]# mount /mnt/disk mount: Not a directory In my syslog I got: reiserfs: checking transaction log (device 09:00) ... is_tree_node: node level 6425 does not match to the expected one 4 vs-5150: search_by_key: invalid format found in block 150545. Fsck?... vs-13040: reiserfs_read_inode2: i/o failure occurred trying to find stat data of [1 2 0x0 SD] Using r5 hash to sort names is_tree_node: node level 6425 does not match to the expected one 4 vs-5150: search_by_key: invalid format found in block 150545. Fsck? vs-2140: finish_unfinished: search_by_key returned -2 ReiserFS version 3.6.25 I upgraded my kernel to 2.4.9-ac14, then I did reiserfsck with reiserfsprogs-3.x.0j, but it segfaulted. reiserfsprogs-3.x.0k-pre10 worked, so I did reiserfsck --rebuild-tree /dev/md0 and this fixed it. The disk worked for a few hours, then exactly the same thing happened while the disks were spinning up. While writing this, I'm doing rebuild-tree again, but it seems that this cure doesn't last very long. It seems to me that I have a problem with my IDE somewhere below reiserfs that needs to be worked out. However, it still seems to be a bug in reiserfs that corrupts my filesystem when it gets confused, instead of just giving up so it would work the next time I remounted the partition. Hoping this can be solved, Jonas Jensen PS: please CC me as I don't subscribe to this list.