On Thursday 21 October 2004 18:16, [EMAIL PROTECTED] wrote:
> I have a server vanilla 2.4.26 SMP kernel, Adaptec SCSI RAID5 w/ reiserfs, 
been
> running fine for about 9 months.
> 
> I recently (last 3 days) started doing some back ups with most current rsync
> (2.6.3), just reading from the disk in this machine writing to a remote
> machine. That's the only thing I can think of that's changed, besides added 
an
> ntp server but I am assuming that's not it. At the same time this rsync job 
was
> running there is a samba share I am reading from and translating data from, 
this
> job has been running since the machine was installed though never caused a
> problem.
> 
> this morning the machine locked up, still responded to pings but everything 
else
> seemed completely dead. Checking the error logs after reboot I see this:
> 
> 
> Oct 21 04:15:04 hsa10 kernel: sd(8,3):vs-13060: reiserfs_update_sd: stat 
data of
>  object [25333038 23882091 0x0 SD] (nlink == 1) not found (pos 33)
> Oct 21 04:15:04 hsa10 last message repeated 3 times
> Oct 21 04:15:04 hsa10 kernel: sd(8,3):PAP-12350: do_balance: insert_size == 
0, m
> ode == p
> Oct 21 04:15:04 hsa10 kernel:  <4>sd(8,3):vs-13060: reiserfs_update_sd: stat 
dat
> a of object [25333038 23882091 0x0 SD] (nlink == 1) not found (pos 33)
> Oct 21 04:15:04 hsa10 kernel: sd(8,3):vs-13060: reiserfs_update_sd: stat 
data of
>  object [25333038 23882091 0x0 SD] (nlink == 1) not found (pos 33)
> 
> and on and on until an hour and 40 minutes later when the logs end.
> 
> these numbers: [25333038 23882091 0x0 SD]  and vs-13060 do not change in the
> rest of the log.
> 
> there are other messages that aren't exactly like the above mixed in like:
> 
> Oct 21 04:18:00 hsa10 kernel: sd(8,3):PAP-5660: reiserfs_do_truncate: wrong
> result -1 of search for [25333038 23882091 0xfffffffffffffff DIRECT]
> 
> also I noticed all these retries happen at around 00 seconds after the 
minute
> (top of the minute)
> 
> So what should I do? what could cause this as well? We have a write cache
> (another situation where there can be this kind of error apparently dring 
power
> loss) on this server but it's also battery backed up so that shouldn't be an
> issue (according to what I read).
> 
> on reboot the dmesg spews:
> 
> reiserfs: found format "3.6" with standard journal
> reiserfs: checking transaction log (device sd(8,3)) ...
> for (sd(8,3))
> reiserfs: replayed 63 transactions in 0 seconds
> sd(8,3):Using r5 hash to sort names
> 
> I see some people who have gotten this type of error told
> to rebuild the filesystem, is that suggested in all cases? I don't want to
> have to do that if it's not needed. We do have back ups but even so ...
> 
> the server is rebooted and seems to be running fine now.
> 
> Any help is appreciated.
> 
> brian
> 

Power loss + enabled write cache should not be a problem if the scsi 
controller has a battery pack. That's what it's there for.

Try to reiserfsck the filesystem with the most recent tools available at:
ftp://ftp.namesys.com/pub/reiserfsprogs/reiserfsprogs-3.6.19.tar.gz

You can check the version you owe, with:
# reiserfsck -V

To repair, first umount the fs or remount it read only and do:
# reiserfsck --check /dev/sda3

After this run, reiserfsck will suggest a method of repair which you
should follow. If --fix-fixable is sufficient it will suggest it, but
if it cannot you'll have to use --rebuild-tree.

Note: --rebuild-tree will find files on vmware images or loopback files
  that contain reiserfs filesystems, because the reiserfs design lacks
  the support of per filesystem unique keys.

-- 
lg, Chris

Reply via email to