ReiserFS errors in /var/log/messages

2005-09-14 Thread Dan Oglesby
I recently started seeing the following message repeating in my 
/var/log/messages file on a server:


kernel: is_leaf: free space seems wrong: level=1, nr_items=39, 
free_space=0 rdkey

kernel: vs-5150: search_by_key: invalid format found in block 5405324. Fsck?
kernel: vs-13070: reiserfs_read_inode2: i/o failure occurred trying to 
find stat data of [11 618208 0x0 SD]


Every day a large filesystem rsyncs to another filesystem for 
backup/redundancy, and it looks like this coincides with the error 
listed above.  It repeats about 13 times every time rsync runs.  It's 
always the same block number.


This filesystem lives on a 3Ware RAID-5 array, and is running a 2.4.20 
kernel from RedHat that we cannot upgrade due to hardware and software 
dependancies.


I'm planning on downloading the latest version of the ReiserFS tools, 
and checking for errors/repairing.


Anyone know what this error means, so I have a better idea of what I'm 
dealing with?  I find it strange that I could have a block error on a 
logical drive (the RAID-5 device).


Thanks...

--Dan


Re: reiserfs errors and kernel panic, are they related?

2004-08-28 Thread Sean Plaice
On Thu, 26 Aug 2004 18:28:37 -0700, Sean Plaice <[EMAIL PROTECTED]> wrote:
> Hello,
> I just spent the last couple hours in our dev environment simulating
> backing up and restoring reiserfs partitions using dd_rescue. So
> please ignore the questions regarding best practices for repairing
> file system corruption via --rebuild-tree.
> 
> When I have an available outage window I will attempt to repair the
> file system and confirm if the kernel panic can be reproduced. I will
> have a serial console available at that time so I can capture the
> complete panic message.

Hello,
I was able to repair the filesystem that was reporting the corruption
tonight using dd_rescue and --rebuild-tree. Repairing the filesystem
also fixed the problem with the kernel panics. I have not been able to
reproduce the kernel panic since I repaired the filesystem.

Before I repaired the filesystem I was able to get a full copy of the
oops with a serial console.

Below is the full oops, and the oops ran through ksymoops. I have
never used ksymoops before, but these are the best results (least
warnings) I was able to acheive. I am unsure why it complains about
the modules being different, I used the same modules that are
contained in the initrd that my system uses at boot.

I am sure the level of interest for reiserfs 3.6 stuff is at an all
time low with reiserf4 being added the main tree, but if anyone finds
any of this information worth debugging and would like further
information from me please let me know. I still have an image the file
system in its corrupted state if it is needed to debug this problem.

Take care.

Raw oops message:
Unable to handle kernel paging request at virtual address f800
 printing eip:
f884c45a
*pde = 
Oops: 
tg3 floppy sg microcode reiserfs aacraid sd_mod scsi_mod  
CPU:0
EIP:0060:[]Not tainted
EFLAGS: 00010203

EIP is at reiserfs_readdir [reiserfs] 0x21a (2.4.22-1.2199.nptlsmp)
eax:    ebx: ddaf   ecx: 3f70013d   edx: 
esi: f7fe   edi: f1695758   ebp: ef297ea0   esp: ef297e08
ds: 0068   es: 0068   ss: 0068
Process updatedb (pid: 2113, stackpage=ef297000)
Stack: ef297ec0 ef297ee0  0db0  65636552  7ee8 
     f5c300c0 0db1  f5c00228 2fc3 0016 
   f729fe00 f70d7880 f729fe00 0016 f5c00228 2fb7 f5c00490 0008 
Call Trace:   [] balance_dirty [kernel] 0xc (0xef297e80)
[] __block_commit_write [kernel] 0x84 (0xef297e8c)


Code: f3 a5 f6 c3 02 74 02 66 a5 f6 c3 01 74 01 a4 8b 44 24 20 c7 
 <1>Unable to handle kernel paging request at virtual address b7b0f7d7
 printing eip:
c013b540
*pde = 
Oops: 
tg3 floppy sg microcode reiserfs aacraid sd_mod scsi_mod  
CPU:0
EIP:0060:[]Not tainted
EFLAGS: 00010286

EIP is at lock_vma_mappings [kernel] 0x10 (2.4.22-1.2199.nptlsmp)
eax: b7b0f7cf   ebx: f68dfd00   ecx: f6dce3b8   edx: 
esi: f6dce380   edi: 00015000   ebp: 0027f000   esp: ef297c98
ds: 0068   es: 0068   ss: 0068
Process updatedb (pid: 2113, stackpage=ef297000)
Stack: c013cd16 f68dfd00   f167bb00 f6dce380 f6dce380 ef296000 
   000b c0121c6c f6dce380 c0413a00 f6dce380  ef296000 c0128436 
   f6dce380 0068 ef297dd4 4e00 c0004e00 f800 c010a224 000b 
Call Trace:   [] exit_mmap [kernel] 0x96 (0xef297c98)
[] mmput [kernel] 0x6c (0xef297cbc)
[] do_exit [kernel] 0x136 (0xef297cd4)
[] die [kernel] 0x94 (0xef297cf0)
[] do_page_fault [kernel] 0x2a3 (0xef297d04)
[] scheduler_tick [kernel] 0x120 (0xef297d38)
[] is_tree_node [reiserfs] 0x74 (0xef297d40)
[] search_by_key [reiserfs] 0x594 (0xef297d54)
[] update_process_times [kernel] 0x3e (0xef297d94)
[] smp_apic_timer_interrupt [kernel] 0x14c (0xef297db0)
[] do_page_fault [kernel] 0x0 (0xef297dc0)
[] error_code [kernel] 0x34 (0xef297dc8)
[] reiserfs_readdir [reiserfs] 0x21a (0xef297dfc)
[] balance_dirty [kernel] 0xc (0xef297e80)
[] __block_commit_write [kernel] 0x84 (0xef297e8c)


Code: 8b 40 08 8b 90 c0 00 00 00 85 d2 74 0a f0 fe 4a 2c 0f 88 e3 
 <1>Unable to handle kernel paging request at virtual address 5c20c1cc
 printing eip:
c016d34d
*pde = 
Oops: 
tg3 floppy sg microcode reiserfs aacraid sd_mod scsi_mod  
CPU:0
EIP:0060:[]Not tainted
EFLAGS: 00010286

EIP is at dnotify_flush [kernel] 0x1d (2.4.22-1.2199.nptlsmp)
eax: f108ad80   ebx: f186f600   ecx: f653c880   edx: 5c20c19a
esi: f65d5780   edi: f186f600   ebp:    esp: ef297b18
ds: 0068   es: 0068   ss: 0068
Process updatedb (pid: 2113, stackpage=ef297000)
Stack: f654c300 f654c300 f186f600  f65d5780 c015079b f186f600 f65d5780 
   0003 0004 f65d5780 0001 c01276fc f186f600 f65d5780  
    ef296000 000b c012846a f65d5780 0068 ef297c64  
Call Trace:   [] filp_close [kernel] 0x7b (0xef297b2c)
[] put_files_struct [kernel] 0x6c (0xef297b48)
[] do_exit [kernel] 0x16a (0xef297b64)
[] die [kernel] 0x94 (0xef297b80)
[] do_page_f

reiserfs errors and kernel panic, are they related?

2004-08-26 Thread Sean Plaice
Hello,
In the last couple of days one of my production servers started
rebooting due to a kernel panic. I believe this could be related to
something in the reiserfs file system that is causing the  kernel to
panic. The panic also causes data corruption on some system files that
are heavily accessed when the panic occurs.

 I will detail the scenario as best I can below. I was able to find
and replicate what is causing the panic, but due to the server being
in production I have refrained from extensive testing until I can
schedule an outage window. I also have refrained from trying to repair
the file system errors to avoid make an un-informed attempt that could
cause more harm then good.

System Details:
Dell Poweredge 2650 - Dual Intel Xeon 2.8Ghz
PERC3di SCSI-RAID Controller using the aacraid driver on RAID-10 raid set.
Red Hat/Adaptec aacraid driver (1.1-3 Aug  4 2004 12:11:35)

Fedora Core 1
Kernel:  2.4.22-1.2199.nptlsmp

Tracking down any error messages has been difficult the systems syslog
appears to fail to record the kernel error messages. Though I was able
to find some error message from the log of a scheduled job that runs
on the server that repeatably triggers the kernel panic. I was also
able to too a screen shot of part of the kernel panic message using
remote access console (no serial console as of yet).

Kernel Panic Message:
EIP:0060:[]   Not tainted
EFLAGS: 00010206

EIP is at do_page_fault [kernel] 0x26a (2.4.22-1.2199.nptlsmp)
eax: 0013   ebx: 73747000   ecx: c0374888   edx: 6912
esi: f7facca4   edi: f7ffa000   ebp: 000f   esp: f7ffbe18
ds: 0068   es: 0068   ss: 0068
Process init (pid: 1, stackpage=f7ffb000)
Stack: c02a68af 73747069  f7ffbee8  f88630bf 0001 1680f54c
   0003 0017 001b657a  0206 c0376730 00030001 
   c037667c 0286 0001 f1dca8c0   0003 f1dca8c0
Call Trace: [] check_journal_end [reiserfs] 0x16f (0xf7ffbe2c)
[] schedule [kernel] 0x3fc (0xf7ffbe90)
[] do_page_fault [kernel] 0x0 (0xf7ffbed0)
[] error_code [kernel] 0x34 (0xf7ffbed8)
[] poll_freewait [kernel] 0x23 (0xf7ffbf0c)
[] do_select [kernel] 0x151 (0xf7ffbf24)
[] sys_select [kernel] 0x34e (0xf7ffbf60)
[] sys_fstat64 [kernel] 0x49 (0xf7ffbfa8)
[] system_call [kernel] 0x33 (0xf7ffbfc0)


Code: 8b 9c ab 00 00 00 c0 c7 04 24 c0 68 2a c0 89 5c 24 04 e8 ef
 <0>Kernel panic: Attempted to kill init!

I am able to reproduce the kernel panic by running the prelinking, and
slocate daily cron jobs. Within the the log for the prelinking job it
appears that some syslog messages, regarding reiserfs errors. It
appears that this information was concatenated with the prelinking log
due to corruption since the end of the file is filled with garbage
binary data.

Here are the errors listed in the prelinking log. 
/usr/lib/libtiff.so.3.5  0040Aug 23 21:02:09
 mail01 syslogd 1.4.1: restart.
Aug 23 21:02:10 mail01 kernel: sd(8,6):vs-13060: reiserfs_update_sd: stat data o
f object [1148 1150 0x0 SD] (nlink == 1) not found (pos 25)
Aug 23 21:02:15 mail01 last message repeated 12 times
Aug 23 21:02:16 mail01 kernel: sd(8,6):vs-13060: reiserfs_update_sd: stat data o
f object [1148 1150 0x0 SD] (nlink == 1) not found (pos 27)
Aug 23 21:02:18 mail01 last message repeated 20 times
Aug 23 21:02:22 mail01 kernel: sd(8,6):vs-13060: reiserfs_update_sd: stat data o
f object [1148 1150 0x0 SD] (nlink == 1) not found (pos 25)
Aug 23 21:02:32 mail01 last message repeated 24 times
Aug 23 21:02:33 mail01 kernel: sd(8,6):vs-13060: reiserfs_update_sd: stat data o
f object [1148 1150 0x0 SD] (nlink == 1) not found (pos 27)
Aug 23 21:02:33 mail01 last message repeated 5 times
Aug 23 21:02:35 mail01 kernel: sd(8,6):vs-13060: reiserfs_update_sd: stat data o
f object [1148 1150 0x0 SD] (nlink == 1) not found (pos 25)
Aug 23 21:02:35 mail01 last message repeated 7 times
Aug 23 21:02:36 mail01 kernel: sd(8,6):vs-13060: reiserfs_update_sd: stat data o
f object [1148 1150 0x0 SD] (nlink == 1) not found (pos 27)
Aug 23 21:02:36 mail01 kernel: sd(8,6):vs-13060: reiserfs_update_sd: stat data o
f object [1148 1150 0x0 SD] (nlink == 1) not found (pos 27)
Aug 23 21:02:39 mail01 kernel: sd(8,6):vs-13060: reiserfs_update_sd: stat data o
f object [1148 1150 0x0 SD] (nlink == 1) not found (pos 25)
Aug 23 21:02:42 mail01 last message repeated 8 times
Aug 23 21:02:43 mail01 kernel: sd(8,6):vs-13060: reiserfs_update_sd: stat data o
f object [1148 1150 0x0 SD] (nlink == 1) not found (pos 29)
Aug 23 21:02:43 mail01 kernel: sd(8,6):vs-13060: reiserfs_update_sd: stat data o
f object [1148 1150 0x0 SD] (nlink == 1) not found (pos 29)
Aug 23 21:02:43 mail01 kernel: sd(8,6):vs-13060: reiserfs_update_sd: stat data o
f object [1148 1150 0x0 SD] (nlink == 1) not found (pos 25)
Aug 23 21:02:43 mail01 last message repeated 3 times
Aug 23 21:02:44 mail01 kernel: sd(8,6):vs-13060: reiserfs_update_sd: stat data o
f object [1148 1150 0x0 SD] (nlink == 

reiserfs errors

2003-09-08 Thread Fong Vang
Could someone tell me what these error messages mean (these messages 
appear on the console and in /var/log/messages)?

Aug 30 15:49:51 fongtest kernel: is_leaf: free space seems wrong: 
level=1, nr_items=58, free_space=12 rdkey
Aug 30 15:49:51 fongtest kernel: vs-5150: search_by_key: invalid 
format found in block 15395. Fsck?
...
...
Aug 30 15:49:51 fongtest kernel: vs-5150: search_by_key: invalid 
format found in block 15395. Fsck?
Aug 30 15:49:51 fongtest kernel: vs-13070: reiserfs_read_inode2: i/o 
failure occurred trying to find stat data of [6 86762 0x0 SD]
Aug 30 15:49:51 fongtest kernel:  [6 86779 0x0 SD]
...

It doesn't say which device is giving this error message.

This is the system configuration:

Dual 2.4 Intel Xeon Processors (on SuperMicro motherboard)
RedHat Linux 7.1
RedHat's 2.4.18-24.7.xsmp kernel
ReiserFS version 3.6.25


Any help would be appreciated.




This e-mail has been captured and archived by the ZANTAZ Digital Safe(tm)
service.  For more information, visit us at www.zantaz.com. 
IMPORTANT: This electronic mail message is intended only for the use of the
individual or entity to which it is addressed and may contain information
that is privileged, confidential or exempt from disclosure under applicable
law.  If the reader of this message is not the intended recipient, or the
employee or agent responsible for delivering this message to the intended
recipient, you are hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If you have received
this communication in error, please notify the sender immediately by
telephone or directly reply to the original message(s) sent.  Thank you.


Re: Mapping reiserfs errors to hardware devices or filesystems

2003-07-28 Thread Vitaly Fertman

Hi, 

On Monday 28 July 2003 09:01, Donald Thompson wrote:
> I believe I'm having a hardware issue, but I have no idea what device is
> the problem from the error messages I'm getting.
>
> I'm seeing the following repeated:
>
> vs-500: unknown uniqueness 26169
> vs-500: unknown uniqueness 26169
> vs-500: unknown uniqueness 26169
> vs-500: unknown uniqueness 26169
> vs-500: unknown uniqueness 26169
> vs-500: unknown uniqueness 26169
> vs-5150: search_by_key: invalid format found in block 90922. Fsck?
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [26164 26256 0x0 SD]
>
> Its been repeating itself over several reboots. It does not seem to be
> fatal, as the system remains up and working properly.

This was just a bad phrasing, the problem is in tree structure only here, 
obtain the latest reiserfsprogs from out ftp site please and run reiserfsck.

> The entire system, including the root device uses reiserfs. 4 of the
> filesystems sit on logical volumes. This is a debian sid system, kernel
> 2.4.20.

you can reiserfsck --check on all fs's. Or you can run 
debugreiserfs -1 90922 /dev/xxx
for all fs's to print their block #90922 content. It will have messages like 

|  0|280725 786601 0x0 SD (0), len 32, location 4064 . 

where '280725 786601 0x0 SD (0)' is the key of an item and 'SD (0)' is 
the uniqueness. reiserfsck the fs whose block #90922 has an item with 
uniqueness 26169.

> Is there any way from the messages generated to identify what filesystem
> is having problems? I figure my only other option is to go into single
> user mode, mount everything read-only and fsck each filesystem. I'm just
> lazy and was hoping there was an easier way of tracking it down.

If I recall correctly, Oleg made some improvements in these warnings some 
time ago.

-- 
Thanks,
Vitaly Fertman


Re: Mapping reiserfs errors to hardware devices or filesystems

2003-07-27 Thread Yury Umanets
On Mon, 2003-07-28 at 09:01, Donald Thompson wrote:
> I believe I'm having a hardware issue, but I have no idea what device is
> the problem from the error messages I'm getting.
> 
> I'm seeing the following repeated:
> 
> vs-500: unknown uniqueness 26169
> vs-500: unknown uniqueness 26169
> vs-500: unknown uniqueness 26169
> vs-500: unknown uniqueness 26169
> vs-500: unknown uniqueness 26169
> vs-500: unknown uniqueness 26169
> vs-5150: search_by_key: invalid format found in block 90922. Fsck?
> vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
> data of [26164 26256 0x0 SD]


> 
> Its been repeating itself over several reboots. It does not seem to be
> fatal, as the system remains up and working properly.
> 
> The entire system, including the root device uses reiserfs. 4 of the
> filesystems sit on logical volumes. This is a debian sid system, kernel
> 2.4.20.
> 
> Is there any way from the messages generated to identify what filesystem
> is having problems? I figure my only other option is to go into single
> user mode, mount everything read-only and fsck each filesystem. I'm just
> lazy and was hoping there was an easier way of tracking it down.
> 
> Any help is much appreciated.

Hello,

This is fixed in last pre kernels.


> 
> -Don
-- 
We're flying high, we're watching the world passes by...



Mapping reiserfs errors to hardware devices or filesystems

2003-07-27 Thread Donald Thompson
I believe I'm having a hardware issue, but I have no idea what device is
the problem from the error messages I'm getting.

I'm seeing the following repeated:

vs-500: unknown uniqueness 26169
vs-500: unknown uniqueness 26169
vs-500: unknown uniqueness 26169
vs-500: unknown uniqueness 26169
vs-500: unknown uniqueness 26169
vs-500: unknown uniqueness 26169
vs-5150: search_by_key: invalid format found in block 90922. Fsck?
vs-13070: reiserfs_read_inode2: i/o failure occurred trying to find stat
data of [26164 26256 0x0 SD]

Its been repeating itself over several reboots. It does not seem to be
fatal, as the system remains up and working properly.

The entire system, including the root device uses reiserfs. 4 of the
filesystems sit on logical volumes. This is a debian sid system, kernel
2.4.20.

Is there any way from the messages generated to identify what filesystem
is having problems? I figure my only other option is to go into single
user mode, mount everything read-only and fsck each filesystem. I'm just
lazy and was hoping there was an easier way of tracking it down.

Any help is much appreciated.

-Don


Re: [reiserfs-list] Serious ReiserFS errors when updating from2.4.18pre9 to rc1

2002-02-19 Thread Benjamin Scott

On Tue, 19 Feb 2002, Jens Benecke wrote:
> I know. I did, for exactly this reason. It's a trade-off, in some ways.
> Until somebody finds a way to lock parts of the file system in idle times,
> to do the integrity check 'live' (like scandisk does for FAT).

  This really doesn't work for that, either.  Basically, Microsoft's
ScanDisk sets up a kernel hook, where ScanDisk is notified of any
application wanting to write to the disk.  At that point, ScanDisk releases
all locks and allows the write, and restarts the ScanDisk process from the
beginning.  On an active system, this means ScanDisk sits there in
live-lock, continuously restarting, until it eventually reaches some
internal limit and cancels the scan.  Needless to say, this doesn't work
very well.

  I suppose you *might* be able to pull this off by deferring filesystem
commits, keeping the backlog in the journal.  This would keep the filesystem
proper in a consistent state for the check.  You would need full data
journaling, though, and your journal could fill-up pretty quick.  H, and
if you actually *find* an error, what happens to the deferred writes still
in the journal?  I suppose you could just set an error bit, commit the
journal, and do an offline repair when you can.

  No, I'm not volunteering to implement the above.  Hell, I don't even know
if it would work, since I just made the whole thing up on the spot.  :-)

-- 
Ben Scott <[EMAIL PROTECTED]>
| The opinions expressed in this message are those of the author and do not |
| necessarily represent the views or policy of any other person, entity or  |
| organization.  All information is provided without warranty of any kind.  |




Re: [reiserfs-list] Serious ReiserFS errors when updating from 2.4.18pre9 to rc1

2002-02-18 Thread Oleg Drokin

Hello!

On Tue, Feb 19, 2002 at 12:16:12AM +0100, Jens Benecke wrote:
> > This is easily reproducable on 2.5, so there is no point to do it on
> > 2.4.  They 2.4 and 2.5 share most of the code, anyway.
> Right... but I'm not planning to use 2.5 any time soon. Perhaps you
> understand why if you look at www.jensbenecke.de/misc/ws02k.ps. =;)
I am not going to urge anyone to try 2.5 for any use anyway. ;)
I just made sure you was not using 2.5

> > Basically, I think if you read some errors were noticed in some
> > filesystem, and then fixed and you plan to upgrade to that kernel, it
> > is better to run fsck first, just in case. So that you are sure you
> Yes, I will next time. The problem is I (usually) upgrade only when
> I have specific problems with the current kernel, and not when I don't
> notice any problems - AND I would have about 240G of fsck to go through,
> which isn't really little.
I doubt you upgrade kernels often, then. So you probably might
do fsck in time of upgrade. It's up to you, anyway.

> > was not bitten by previous errors (and you also have perfectly valid
> > reason to argue that new fixes broke something, if new code breaks and
> > there were no errors on the filesystem before new code was run).
> The problem is, can you expect the new code to handle broken old data?
Either we can or stuff breaks and warnings/errors are displayed.
I see little reason to handle totally bogus data, in fact.
That's what fsck is for.

> Yes, I wasn't talking about hardware errors. In THEORY, this shouldn't
> happen. ReiserFS _should_ fix every file system (ie metadata) error on
> journal replay, right?
Yes. (though with HDDs that have write cache turned on by default
(and not battery backed) we may have problems here).

> But suppose it doesn't find them all, or there is an obscure bug in the
> journal code, or whatever. So next time you boot you _do_ have subtly
> broken metadata on the disk. Fsck _would_ find this, but it never gets
> executed automatically, because of journalling. So the error stays.
fsck is executed automaticly on system startup (if you have nonzero value
in fsck priority field in your fstab), it is just reiserfsck was not
trusted to be run without users' control. I hope we can get reiserfsck into
much more stable shape, then on error we'd just set "error" bit in the
superblock, and reiserfsck will do a full scan on subsequent reboot
(well may be not a full scan, but something certainly can be done here).
Right now reiserfsck exits when run from fsck -A.

> I know. I did, for exactly this reason. It's a trade-off, in some ways.
> Until somebody finds a way to lock parts of the file system in idle
> times, to do the integrity check 'live' (like scandisk does for FAT).
> Something like
>   - no write access for >$TIMEOUT, and no fsck in > 1 month?
> -> remount read-only, start fsck in background
fsck on a read-only mount is not very safe either.
It can move some blocks, it can delete some stuff which kernel believes should
be here.

> on-line fsck would be just about as appreciated as on-line resizing etc,
> because it potentially saves a lot of downtime. 
I believe that bug-free robust filesystem will be apperciated much more ;)
Unfortunately this goal is almost impossible to achieve.

> AND, you'd be the first, not only on Linux, to have this, AFAIK there is
> no serious file system that can do this.
FreeBSD people claim they can (or will be able soon) to have
a snapshot of their FS, they will run fsck on, while all the other stuff will
work in rw mode as usual.

Bye,
Oleg



Re: [reiserfs-list] Serious ReiserFS errors when updating from 2.4.18pre9 to rc1

2002-02-18 Thread toad

> I know. I did, for exactly this reason. It's a trade-off, in some ways.
> Until somebody finds a way to lock parts of the file system in idle
> times, to do the integrity check 'live' (like scandisk does for FAT).
> Something like
> 
>   - no write access for >$TIMEOUT, and no fsck in > 1 month?
> -> remount read-only, start fsck in background
>   - On write access (or: after enough write accesses to fill 
> a certain buffer cache):
>   - kill fsck ASAP
> ("ASAP" because fsck might be in the middle of a fix)
>   - remount read-write
>   - execute write request
>   - (optional: wait for $TIMEOUT)
>   - remount read-only
>   - restart fsck
> 
> This is similar to what scandisk, defrag, and all the Windows based FAT
> recovery tools do. I don't know how realistic or time consuming
> implementing this would be for a real file system, but I'm quite sure
> on-line fsck would be just about as appreciated as on-line resizing etc,
> because it potentially saves a lot of downtime. 
Ummm, LVM snapshots (for read-only fsck's - have to go offline to
rebuild... or maybe juggle 2 filesystems around... but that'd require an
online shrinker - which is planned for v4). Unfortunately, the standard
LVM doesn't do writable snapshots, so you can't take a snapshot then
fsck the snapshot, because you have to journal replay. Wasn't there some
patch that fixed this?
> 
> AND, you'd be the first, not only on Linux, to have this, AFAIK there is
> no serious file system that can do this.
> 
> Perhaps this will be possible in Reiser4?
>  
> > > > If you need such a feature, you can easily implement it in your
> > > > initscripts.
> > > How do I find out the mount count of a ReiserFS partition?
> > Sigh. No easy way I can see. But your request is heard and next
> > version of reiserfsdebug will print it.
> 
> Know what? You guys are simply great. :-D
> 
> Take your time, fix urgent things first. I don't need this tomorrow.

-- 
The road to Tycho is paved with good intentions



msg04547/pgp0.pgp
Description: PGP signature


Re: [reiserfs-list] Serious ReiserFS errors when updating from 2.4.18pre9 to rc1

2002-02-18 Thread Oleg Drokin

Hello!

On Mon, Feb 18, 2002 at 03:12:34PM +0100, Jens Benecke wrote:
> > > > (have you tried 2.5.3/2.5.4-pre1 kernels there?)
> > > No. I haven't tried 2.5.x. kernels yet and I'm not about to.
> > Just making sure. Error that can cause these items in wrong order
> > errors was fixed recently, but before it was believed to only cause
> > problems on 2.5, so now we know it can happen on 2.4, too.
> I'd be happy to backtrack this if there is a way. (Perhaps I should have
> saved the metadata before the fsck... well, too late).
This is easily reproducable on 2.5, so there is no point to do it on 2.4.
They 2.4 and 2.5 share most of the code, anyway.

> > > See other post, but I cannot reproduce all of this. I really don't
> > > know what went wrong.
> > Probably you used 2.4. kernel without the fix, and then went to
> > 2.4.18-rc1 with a lot of fixes. And these fixes noticed problems.
> I went from 2.4.15pre1 to 2.4.18pre3, which crashed within a couple
> days, so I tried 2.4.18pre7, which also crashed, then I switched some of
> the grsecurity things off and tried pre9, which went well for a few days
> but suddenly didn't let me log on any more, then I tried rc1 with even
> less grsecurity stuff enabled and then all hell broke loose.
> Back to pre9, everything normal, fsck, back to rc1, everything normal -
> for now.
Basically, I think if you read some errors were noticed in some filesystem,
and then fixed and you plan to upgrade to that kernel,
it is better to run fsck first, just in case. So that you are
sure you was not bitten by previous errors (and you also have
perfectly valid reason to argue that new fixes broke something, if new
code breaks and there were no errors on the filesystem before new code was run).

> > > A basic problem I have with ReiserFS is that the journaling makes
> > > you forget about hard disk errors until you get lots of "permission
> > > denied"s, at which time it is usually quite late to do something.
> > Journal is in no direct relation to those "permission denied"s, that
> > data is not from journal.
> Sorry, what I meant is that the journaling works so "well" that you
> don't notice there is something wrong with your disk any more, which
> perhaps the journaling did NOT fix - until you spot a file that isn't
> accessible any more.
No. Journaling does not wirk this way. If you have HDD errors, you'd
find their traces in system log pretty quickly.

> So you can have a corrupted file system without noticing anything
> (because the system came up without errors after the power failure). On
Yes, lack of fsck run on system startu may cover some stuff,
but after all journaling filesystem is supposed to have at least consistent
metadata, so it should not need fsck.  Bugs is the other issue, though ;)

> a non-journaling FS, once the OS spots something is wrong it checks the
> _whole_ disk, and so also finds errors that were not related to the
> crash.
But you waste tons of time, that's why people are converting to journaling
filesystems.

> > If you need such a feature, you can easily implement it in your
> > initscripts.
> How do I find out the mount count of a ReiserFS partition?
Sigh. No easy way I can see. But your request is heard and next version of
reiserfsdebug will print it.

Bye,
Oleg



Re: [reiserfs-list] Serious ReiserFS errors when updating from 2.4.18pre9 to rc1

2002-02-18 Thread Anders Widman


> Hello!

Hi! =)

> On Mon, Feb 18, 2002 at 12:22:38PM +0100, Jens Benecke wrote:
>> > Blocks in wrong order *is* serious!  
>> Oops.
>> > (have you tried 2.5.3/2.5.4-pre1 kernels there?)
>> No. I haven't tried 2.5.x. kernels yet and I'm not about to.
> Just making sure. Error that can cause these items in wrong order
> errors was fixed recently, but before it was believed to only
> cause problems on 2.5, so now we know it can happen on 2.4, too.

>> > > Anyway, I'm now running the supposedly 'broken' kernel without
>> > > problems of the kind I had the first time - yet. I'll update you as
>> > > soon as anything happens. For now, at least I have a current backup,
>> > > at least of my home directory. :)
>> > Ok.
>> See other post, but I cannot reproduce all of this. I really don't know
>> what went wrong.
> Probably you used 2.4. kernel without the fix, and then went to 2.4.18-rc1
> with a lot of fixes. And these fixes noticed problems.

>> A basic problem I have with ReiserFS is that the journaling makes you
>> forget about hard disk errors until you get lots of "permission
>> denied"s, at which time it is usually quite late to do something.
> Journal is in no direct relation to those "permission denied"s,
> that data is not from journal.

Still, ReiserFS does run quite well on a broken hard drive until you
get to the point of a complete hard drive failure. That has actually
happend to my system.

>> Perhaps ReiserFS should, just like ext2, warn you after 50 mounts (or
>> so) to do a fsck once in a while. It doesn't have to be after the crash,
>> but IMHO you shouldn't forget about fsck completely.
> A lot of people would disagree.
> If you need such a feature, you can easily implement it in your initscripts.

Yes, initscrips are better. But I really think that a system should
never get rebooted, which makes all startup functions like fsck pointless (unless you 
really have to reboot for some reason).

Bad block handling would be a nice (if not neccessary) feature to
ReiserFS. This will hopefully be implemented soon, or do we need to
upgrade to Reiser 4 (will this be possible)?

//Anders

> Bye,
> Oleg




Re: [reiserfs-list] Serious ReiserFS errors when updating from 2.4.18pre9 to rc1

2002-02-18 Thread Oleg Drokin

Hello!

On Mon, Feb 18, 2002 at 12:22:38PM +0100, Jens Benecke wrote:
> > Blocks in wrong order *is* serious!  
> Oops.
> > (have you tried 2.5.3/2.5.4-pre1 kernels there?)
> No. I haven't tried 2.5.x. kernels yet and I'm not about to.
Just making sure. Error that can cause these items in wrong order
errors was fixed recently, but before it was believed to only
cause problems on 2.5, so now we know it can happen on 2.4, too.

> > > Anyway, I'm now running the supposedly 'broken' kernel without
> > > problems of the kind I had the first time - yet. I'll update you as
> > > soon as anything happens. For now, at least I have a current backup,
> > > at least of my home directory. :)
> > Ok.
> See other post, but I cannot reproduce all of this. I really don't know
> what went wrong.
Probably you used 2.4. kernel without the fix, and then went to 2.4.18-rc1
with a lot of fixes. And these fixes noticed problems.

> A basic problem I have with ReiserFS is that the journaling makes you
> forget about hard disk errors until you get lots of "permission
> denied"s, at which time it is usually quite late to do something.
Journal is in no direct relation to those "permission denied"s,
that data is not from journal.

> Perhaps ReiserFS should, just like ext2, warn you after 50 mounts (or
> so) to do a fsck once in a while. It doesn't have to be after the crash,
> but IMHO you shouldn't forget about fsck completely.
A lot of people would disagree.
If you need such a feature, you can easily implement it in your initscripts.

Bye,
Oleg



Re: [reiserfs-list] Serious ReiserFS errors when updating from 2.4.18pre9 to rc1

2002-02-14 Thread Chris Mason


[ marcelo, you're bcc'd as an FYI, I'll forward details when we figure
this out ]

On Thursday, February 14, 2002 11:46:35 PM +0100 Jens Benecke <[EMAIL PROTECTED]> 
wrote:

> Hi,
> 
> I compiled the 2.4.18rc1 kernel now (because Marcelo wrote "ReiserFS
> fixes" in the changelog) and with that kernel I cannot access half my
> harddisk any more, and syslog complains it cannot find inode stat data
> (or something like that) a thousand times.
> 
> What happened?  I can access the files normally with 2.4.18pre9 and
> below.
> 
> Do I need to worry?

Yes, that would be something to worry about.  Is this disk 3.5.x or
3.6.x, mounted as root?  What kind of hardware is the disk on?

The reiserfs change in rc1 was pretty minor, are there any other messages
in your logfile?

-chris