
A server of mine suddenly started acting very wonky and spontaneously
rebooting. I've been pulling my hair out trying to narrow down what the
problem was.

Today after one of the reboots, it showed a lot of damage to several of
the data partitions (not the system ones luckily), so I ran fsck on them
to see about cleaning up. For two of them, /dev/hde[56], I only had to
get down to --rebuild-tree to fix them up. However for the last one,
/dev/hdb1, looks to be far worse. 

The system refused to mount it originally, so I ran just plain
--fix-fixable. It showed nothing wrong at all. By a fluke of terminals,
I have a copy of this first output

However the system still refused to mount the drive, showing this in
Mar 30 22:14:22 [kernel] read_super_block: can't find a reiserfs
filesystem on (dev 03:41, block 64, size 1024)
Mar 30 22:14:22 [kernel] read_super_block: can't find a reiserfs
filesystem on (dev 03:41, block 8, size 1024)

Mount showed this:
server1 kernel # mount /dev/hdb1
mount: wrong fs type, bad option, bad superblock on /dev/hdb1,
       or too many mounted file systems

This baffled me, so I decided it wouldn't hurt to run --rebuild-sb and
--rebuild-tree. For rebuild-sb, I don't have the full output, but I did
get: "Super block seems to be correct"

I ran rebuild-tree, and saw no errors. 

The drive still refused to mount.

I dug in fsck.reiserfs --help, and saw '--scan-whole-partition'. Tried
--rebuild-tree with that on. It showed a LOT of stuff about StatDatas,
and completed successfully.

Still no joy in mounting it.

I repeated this, and still got a massive piles of errors

Kernel: 2.4.20-gentoo-r1 (Amongst the patches are: 3.5G-address-space,
rmap-14a, futuxes, imon, missing.list, preempt-ac, supermount, freeswan,
and lots of netfilter stuff)
however I don't see much at a glance that should affect things I think)
Hardware: Pentium III (Katmai) 500mhz CPU (replaced with identical known
good one in hardware testing). 512Mb PC133 SDRAM (replaced with identical known
good part).
Motherboard: Abit AH6 Rev 1.01 (was Rev 1.00 previously) [Intel 440bx].
reiserfsprogs: 3.6.4

I'm not exactly sure what had the partition open at the time, but I am
reasonably confidident that it was NFS.

The partition (~10GiB in size) hosts several dedicated game server
binaries and related files (NFS, exported readonly+async) to other
diskless machines that actually run the games.  (Logs are written to a
different NFS server).

I'm going to wipe the seemingly corrupt contents of the drive later this
week if I don't manage to fix it. But as with all data it would make
life as a SysAdmin so much easier if data just came back magically 8-).
(I think I've spent too much of the weekend playing AD&D).

I just find that there is something definetly wrong if fsck says the
partition is fine, but Linux refuses to mount it. Either this is a bug
in Linux, or the reiserfsprogs. Either way, somebody has a bug :-)

Robin Hugh Johnson
Home Page  :
ICQ#       : 30269588 or 41961639
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to