Re: [reiserfs-list] corrupted reiserfs

2006-11-05 Thread elgaard


I had the same problem. I wanted to debug it so i downloaded the
source (debian unstable), compiled it with CFLAGS=-g and started it
in gdb.

Then it worked fine. The result of dpkg-buildpackage did not work, must be 
because it is stripped.
So it could be a compiler issue or a timing problem.

I was working on a dumped image on a new disk, and swapping RAM made no 
difference, 
so I do not think hardware problems was involved.

-- 
Niels Elgaard Larsen


Re: [reiserfs-list] corrupted reiserfs

2001-10-01 Thread Vladimir V. Saveliev

Hi

Jonas Jensen (by way of Jonas Jensen ) (by way of Jonas Jensen )
wrote:

 One of my reiserfs disks became corrupted last week, and it's still causing
 me problems. I'll try to describe it in full detail, hoping that this problem
 can be fixed for good.

 The disk in question is a Linux software raid 0 partition on 2x40GB Maxtor
 IDE100 drives on a Promise FastTrak100 controller (which has raid support,
 but I use Linux software raid instead).
 When this started, my kernel was 2.4.8-ac9, and the machine had an uptime of
 about 1 month running this kernel without problems.

 I wanted to clean up a bit, then ls started to act weird -- it could list the
 file names in my directories, but it failed to stat most of the files (I run
 ls -F --color).

 In my syslog I got:

 Sep 25 16:06:13 monsterbob kernel: hdg: timeout waiting for DMA
 Sep 25 16:06:13 monsterbob kernel: ide_dmaproc: chipset supported
 ide_dma_timeout func only: 14
 Sep 25 16:06:13 monsterbob kernel: hdg: status timeout: status=0x80 { Busy }
 Sep 25 16:06:13 monsterbob kernel: hdg: drive not ready for command
 Sep 25 16:06:15 monsterbob kernel: ide3: reset: success
 Sep 25 16:06:20 monsterbob kernel: is_tree_node: node level 0 does not match
 to
 the expected one 1
 Sep 25 16:06:20 monsterbob kernel: vs-5150: search_by_key: invalid format
 found
 in block 8801. Fsck?
 Sep 25 16:06:20 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o
 failure occurred trying to find stat data of [2091 2092 0x0 SD]
 Sep 25 16:09:43 monsterbob kernel: is_tree_node: node level 0 does not match
 to
 the expected one 1
 Sep 25 16:09:43 monsterbob kernel: vs-5150: search_by_key: invalid format
 found
 in block 11746. Fsck?
 Sep 25 16:09:43 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o
 failure occurred trying to find stat data of [3 2100 0x0 SD]
 Sep 25 16:09:43 monsterbob kernel: is_leaf: free space seems wrong: level=1,
 nr_items=1, free_space=0 rdkey
 Sep 25 16:09:43 monsterbob kernel: vs-5150: search_by_key: invalid format
 found
 in block 11749. Fsck?
 Sep 25 16:09:43 monsterbob kernel: vs-13070: reiserfs_read_inode2: i/o
 failure occurred trying to find stat data of [3 2101 0x0 SD]
 [etc...]

 From what I can see, there was first a problem because my disks were sleeping
 and they didn't spin up fast enough. Perhaps hdg was removed from my striped
 raid or something, which confused reiserfs a lot.

 I unmounted the partition, hoping that it would work when I remounted it, but
 it failed:

 [root@monsterbob root]# mount /mnt/disk
 mount: Not a directory

 In my syslog I got:

 reiserfs: checking transaction log (device 09:00) ...
 is_tree_node: node level 6425 does not match to the expected one 4
 vs-5150: search_by_key: invalid format found in block 150545. Fsck?...
 vs-13040: reiserfs_read_inode2: i/o failure occurred trying to find stat data
 of [1 2 0x0 SD]
 Using r5 hash to sort names
 is_tree_node: node level 6425 does not match to the expected one 4
 vs-5150: search_by_key: invalid format found in block 150545. Fsck?
 vs-2140: finish_unfinished: search_by_key returned -2
 ReiserFS version 3.6.25

 I upgraded my kernel to 2.4.9-ac14, then I did reiserfsck with
 reiserfsprogs-3.x.0j, but it segfaulted. reiserfsprogs-3.x.0k-pre10 worked,
 so I did reiserfsck --rebuild-tree /dev/md0
 and this fixed it. The disk worked for a few hours, then exactly the same
 thing happened while the disks were spinning up.
 While writing this, I'm doing rebuild-tree again, but it seems that this
 cure doesn't last very long.

 It seems to me that I have a problem with my IDE somewhere below reiserfs
 that needs to be worked out. However, it still seems to be a bug in reiserfs
 that corrupts my filesystem when it gets confused, instead of just giving up
 so it would work the next time I remounted the partition.


IMHO, when hardware starts to fail - it is time to think about changing
it.
Reiserfs has not way to know when it should give up. It sends correct
data to
disk, broken hardware writes it wrong. Who did corrupt the data then?
The worst thing in your case is (as it looks for me) that you do not
have
unreadable blocks in certain places but harddisk fails randomly.

Anyway, next time your data will become available - you should find a
way to
backup then on reliable hardware.

Thanks,
vs



 Hoping this can be solved,
 Jonas Jensen

 PS: please CC me as I don't subscribe to this list.



Re: [reiserfs-list] corrupted reiserfs

2001-10-01 Thread Jonas Jensen

On Monday 01 October 2001 12:01, you wrote:
 IMHO, when hardware starts to fail - it is time to think about changing
 it.
 Reiserfs has not way to know when it should give up. It sends correct
 data to
 disk, broken hardware writes it wrong. Who did corrupt the data then?
 The worst thing in your case is (as it looks for me) that you do not
 have
 unreadable blocks in certain places but harddisk fails randomly.
You have a valid point. The thing is just that I can see from my syslog that 
reiserfs knew something was very wrong, yet it just kept going. Perhaps it 
should stop writing to the disk at that point (note that I didn't perform any 
write operations, just ls and cd (unless modifying atime counts as write, 
I think I forgot to mount with noatime :-( )).

 Anyway, next time your data will become available - you should find a
 way to
 backup then on reliable hardware.

Here's my new problem: I can't get the data back this time. Can someone give 
me some good ideas on how to get my data back? I ran 
reiserfsck --rebuild-tree /dev/md0
and it segfaulted during pass 1 as shown here:

20%40%60%80%100% left 0, 545 /sec
not set got 5146 hits
r5 got 1154 hits
Flushing..done
Read blocks (but not data blocks) 18251667
Leaves among those 6791
- corrected leaves 5366
- leaves all contents of which could not be saved and 
deleted 3
pointers in indirect items to wrong area 1005357 (zeroed)
Objectids found 804

Pass 1 (will try to insert 6788 leaves):
### Pass 1 ###
Looking for allocable blocks .. ok
0%build_the_tree: nothing but leaves are expected. Block 8212 - ?? 6787, 0 
/sec
build_the_tree: nothing but leaves are expected. Block 8241 - ?? 6786, 0 /sec
build_the_tree: nothing but leaves are expected. Block 8242 - ?? 6785, 0 /sec
build_the_tree: nothing but leaves are expected. Block 8243 - ?? 6784, 0 /sec
build_the_tree: nothing but leaves are expected. Block 8245 - ?? 6783, 0 /sec
build_the_tree: nothing but leaves are expected. Block 8246 - ?? 6782, 0 /sec
build_the_tree: nothing but leaves are expected. Block 8250 - ?? 6781, 0 /sec
build_the_tree: nothing but leaves are expected. Block 8255 - ?? 6780, 0 /sec
left 6779, 0 /sec
mark_block_used: (2049648) used already


Aborted (core dumped)


When I try to mount the disk, it seems that it's completely trashed this time.

[root@monsterbob fsck]# mount /mnt/disk
mount: wrong fs type, bad option, bad superblock on /dev/md0,
   or too many mounted file systems


Trying to rebuild-tree again, it coredumps:

Pass 0:
### Pass 0 ###
Loading on-disk bitmap .. ok, 18260601 blocks marked used
bit 20021248, bitsize 20009824
reiserfsck: bitmap.c:134: reiserfs_bitmap_test_bit: Assertion `bit_number  
bm-bm_bit_size' failed.
Aborted (core dumped)


So I hope someone has an idea on how to save my data, or else I'll have to 
reformat :-(.

Thanks,
Jonas Jensen



Re: [reiserfs-list] corrupted reiserfs

2001-10-01 Thread Jonas Jensen

On Monday 01 October 2001 19:26, you wrote:
 Yes, please find realiable harddrive, backup broken fielsystem there via
 dd if=/dev/filesystem-with-problem of=/dev/reliable-device bs=4096
 conv=notrunc

 Then we will be able to recover your data. Please let us know when you have
 done with that data move.

I can't get a 80GB disk... I might be able to get a 75GB one, but it would 
cost me.
However I believe my hardware is reliable now that I've turned off DMA and 
sleep (From what I can see, the problem was caused by the combo of these 
options, as my syslog said timeout waiting for DMA, and I know the problem 
happened while the disks were waking up). The disks had been running in my 
Windows machine for many months and in my Linux for about 1 month without 
optimisations and without problems.

/Jonas Jensen