[reiserfs-list] Filesystem Corruption

2002-06-06 Thread Kurt

  
 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]> 
 to file:  06/06/2002 02:00 PM
 pic13835.pcx)
  







Hello all,
 I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--

Kurt Palmer  SysAdmin
[EMAIL PROTECTED]Advance Internet
201-459-2846



[reiserfs-list] Filesystem Corruption

2002-06-06 Thread Kurt

  
 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]> 
 to file:  06/06/2002 02:00 PM
 pic24262.pcx)
  








 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic13835.pcx)








Hello all,
 I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--

Kurt Palmer  SysAdmin
[EMAIL PROTECTED]Advance Internet
201-459-2846



[reiserfs-list] Filesystem Corruption

2002-06-06 Thread Kurt

  
 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]> 
 to file:  06/06/2002 02:00 PM
 pic11654.pcx)
  








 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic24262.pcx)









 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic13835.pcx)








Hello all,
 I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--

Kurt Palmer  SysAdmin
[EMAIL PROTECTED]Advance Internet
201-459-2846



[reiserfs-list] Filesystem Corruption

2002-06-06 Thread Kurt

  
 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]> 
 to file:  06/06/2002 02:00 PM
 pic04883.pcx)
  








 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic11654.pcx)









 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic24262.pcx)









 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic13835.pcx)








Hello all,
 I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--

Kurt Palmer  SysAdmin
[EMAIL PROTECTED]Advance Internet
201-459-2846



[reiserfs-list] Filesystem Corruption

2002-06-06 Thread Kurt

  
 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]> 
 to file:  06/06/2002 02:00 PM
 pic29967.pcx)
  








 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic30134.pcx)









 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic18956.pcx)









 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic19921.pcx)









 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic06540.pcx)









 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic08003.pcx)









 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic04883.pcx)









 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic11654.pcx)









 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic24262.pcx)









 (Embedded
 image moved   Kurt <[EMAIL PROTECTED]>
 to file:  06/06/2002 02:00 PM
 pic13835.pcx)








Hello all,
 I currently have a system configured as follows :-
1) LVM version 1.0.1-rc4(ish)(03/10/2001)
2) /dev/PROJ/proj on /proj type reiserfs (rw,noatime,notail)
3) /dev/PROJ/proj239G  142G   97G  60% /proj
4) 2.4.17 with reiserfs tools 3.x.0k
5) Reiserfs compiled in (CONFIG_REISERFS_CHECK set to NO)
6) 256 MB RAM ("sar -r" shows memory usage is not abnormal for this box)
7)Tuns of very small files based on log processing
I am told by my co-worker that the system unresponsive and showed reiserfs
related errors on the console.
Upon restart they noticed that the file
/proj/webtrends/receive/bama/www3/access.01Jun.r.gz was unreadable by root
(permission denied).
I did a reiserfsck on the drive and noticed that access.01Jun.r.gz returned an
error stating the file pointed to nowhere.
I was unable to complete a reiserfsck --fix-fixable because of the length of
time that this (fsck) process took since this was an unscheduled downtime.
During the weekend i will attempt to do the fsck again, however i really
needed to know if this problem has been observed by anyone else, and what
steps they took to fix the problem.
-Kurt



--

Kurt Palmer  SysAdmin
[EMAIL PROTECTED]Advance Internet
201-459-2846



Re: [reiserfs-list] Filesystem Corruption

2002-06-06 Thread Oleg Drokin

Hello!

On Thu, Jun 06, 2002 at 02:00:01PM -0400, Kurt wrote:

> error stating the file pointed to nowhere.
> I was unable to complete a reiserfsck --fix-fixable because of the length of 
> time that this (fsck) process took since this was an unscheduled downtime.
> During the weekend i will attempt to do the fsck again, however i really 
> needed to know if this problem has been observed by anyone else, and what 
> steps they took to fix the problem.

We recommend you to upgrade your kernel to 2.4.18.
To know what exact problem is it would be very useful if you'd posted excerpts
from kernel logs with actual errors.
Thank you.

Bye,
Oleg



Re: [reiserfs-list] Filesystem Corruption

2002-06-11 Thread Kurt

Thanks Oleg,
sorry for the late response (i was out of the office) , you may find 
the 
following information on the last crash useful :-
+++
3 04:32:37 devo kernel: vs-13075: reiserfs_read_inode2: dead inode read from 
disk [854 1695654 0x0 SD]. This is likely to be race with knfsd. Ignore
Jun  3 04:32:39 devo kernel: vs-13060: reiserfs_update_sd: stat data of object 
[854 1695654 0x0 SD] (nlink == 1) not found (pos 1)
Jun  3 04:41:38 devo kernel: vs-13060: reiserfs_update_sd: stat data of object 
[854 1695654 0x0 SD] (nlink == 1) not found (pos 1)
Jun  3 04:41:43 devo kernel: vs-13060: reiserfs_update_sd: stat data of object 
[854 1695654 0x0 SD] (nlink == 1) not found (pos 1)

I will upgrade the kernel and reiserfs tools this week and inform you of the 
result after a fsck.
-Kurt

On Friday 07 June 2002 3:15 am, Oleg Drokin wrote:
> Hello!
>
> On Thu, Jun 06, 2002 at 02:00:01PM -0400, Kurt wrote:
> > error stating the file pointed to nowhere.
> > I was unable to complete a reiserfsck --fix-fixable because of the length
> > of time that this (fsck) process took since this was an unscheduled
> > downtime. During the weekend i will attempt to do the fsck again, however
> > i really needed to know if this problem has been observed by anyone else,
> > and what steps they took to fix the problem.
>
> We recommend you to upgrade your kernel to 2.4.18.
> To know what exact problem is it would be very useful if you'd posted
> excerpts from kernel logs with actual errors.
> Thank you.
>
> Bye,
> Oleg

-- 

Kurt Palmer  SysAdmin
[EMAIL PROTECTED]Advance Internet
201-459-2846




[reiserfs-list] Filesystem corruption after resize

2002-06-11 Thread Baldur Norddahl

Hello,

First something about my setup:

md0: 8x80 GB in a RAID5 configuration
md1: 4x160 GB in a RAID5 configuration
/dev/vg01/stuff: the union of md0 and md1 done with lvm.

dark:/mnt# reiserfsck -V

<-reiserfsck, 2002->
reiserfsprogs 3.x.1a

dark:/mnt# resize_reiserfs -v

<-resize_reiserfs, 2002->
reiserfsprogs 3.x.1a

Usage: resize_reiserfs  [-s[+|-]#[G|M|K]] [-fqv] device

dark:/mnt# cat /proc/version 
Linux version 2.4.18 (root@dark) (gcc version 2.95.4 20011006 (Debian
prerelease)) #1 SMP Fri Apr 12 13:40:03 CEST 2002

The system is a dual AMD Athlon(tm) MP 1800+ (1533 MHz), with 1 GB memory.

Now recently one of the 160 GB disks died. Since I still had enough free
space and I wanted to preserve the redundancy, I used resize_reiserfs to
shrink the filesystem. Then I used lvm to move it away from the
non-redundant md1 device.

The exact commands used are:

resize_reiserfs -s 400G /dev/vg01/stuff
lvreduce -l 16693 /dev/vg01/stuff
pvmove -v /dev/md1
vgreduce -v vg01 /dev/md1
resize_reiserfs /dev/vg01/stuff
reiserfsck --check /dev/vg01/stuff

This all worked like a charm, until I noticed that a nightly script that
scans all files, no longer was able to access about 20 files (access denied
even though the script is running as root).

Dmesg is full of this:

vs-5150: search_by_key: invalid format found in block 66153. Fsck?
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
data of [163330 163334 0x0 SD]
is_leaf: free space seems wrong: level=1, nr_items=1, free_space=3040 rdkey 
vs-5150: search_by_key: invalid format found in block 72879. Fsck?
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
data of [168724 168732 0x0 SD]
is_tree_node: node level 29122 does not match to the expected one 1
vs-5150: search_by_key: invalid format found in block 70647. Fsck?
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
data of [167220 167223 0x0 SD]
is_tree_node: node level 2 does not match to the expected one 1
vs-5150: search_by_key: invalid format found in block 66153. Fsck?
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
data of [163330 163334 0x0 SD]

and so on, there is alot of this stuff repeating.

reiserfsck --fix-fixable /dev/vg01/stuff crashes.

Btw. a seperate problem, I am never able to unmount this filesystem
properly. I always get this error:

dark:/mnt# umount stuff
umount: /mnt/stuff: device is busy
dark:/mnt# fuser -v stuff

 USERPID ACCESS COMMAND
stuffroot kernel mount  /mnt/stuff

So without rebooting I can't quote the exact output from --fix-fixable, but
it is approximate the same as when I just run it plain:

dark:/mnt# reiserfsck -l /root/reiserfsck.log /dev/vg01/stuff

<-reiserfsck, 2002->
reiserfsprogs 3.x.1a

Will read-only check consistency of the filesystem on /dev/vg01/stuff
Will put log info to '/root/reiserfsck.log'

Do you want to run this program?[N/Yes] (note need to type Yes):Yes
###
reiserfsck --check started at Tue Jun 11 16:36:38 2002
###
Filesystem seems mounted read-only. Skipping journal replay..
Checking S+tree../  4 (of   6)/ 27 (of 132)/ 44 (of 152)bit 1359513587,
bitsize 136749056
reiserfsck: bitmap.c:168: reiserfs_bitmap_test_bit: Assertion `bit_number <
bm->bm_bit_size' failed.
Aborted


What can I do to resolve this?

Thanks,
  Baldur





Re: [reiserfs-list] Filesystem corruption after resize

2002-06-12 Thread Baldur Norddahl

Quoting Vitaly Fertman ([EMAIL PROTECTED]):
> Hi, 
> 
> > Hello,
> > The exact commands used are:
> >
> > resize_reiserfs -s 400G /dev/vg01/stuff
> > lvreduce -l 16693 /dev/vg01/stuff
> > pvmove -v /dev/md1
> > vgreduce -v vg01 /dev/md1
> > resize_reiserfs /dev/vg01/stuff
> > reiserfsck --check /dev/vg01/stuff
> >
> > This all worked like a charm, until I noticed that a nightly script that
> > scans all files, no longer was able to access about 20 files (access denied
> > even though the script is running as root).
> 
> Do you mean reiserfsck finished without any error/warning massage? 

Yes, it did not detect any errors after the resize. The errors turned up a
day after. So it might not be 100% that those two events are linked. But
since nothing else was done that could justify corruptions, that is the
theory I am working on.

> This progs I send to you is what is going to be the next release. 
> Please run --check and tell me what is in fsck.log. You can run 
> --fix-fixable if it says so, but it would be better to run 
> rebuild-tree on a copy (it is not a release). Or you can do the following:
> 
> debugreiserfs/debugreiserfs -p /dev/vg01/stuff | gzip -p > stuff.gz
> 
> it will pack metadata (without filebodies), I will download it and test 
> locally.

I will send you those two files in a seperate mail.

I copied all the data over to the other raid device, so I am not so much
concerned about rescueing the filesystem - I could just reformat the whole
thing and copy the files back.

But I would very much like to find out what happened so I can take actions
to prevent it from happening again. Particularly I need to know if resizing
on lvm devices is working properly, since I will need to resize again
shortly when the replacement disk arrives.

Baldur