Re: is quotacheck slow with reiserfs

2006-10-06 Thread Konstantin Münning
Louis-David Mitterrand wrote:
 On Fri, Oct 06, 2006 at 02:09:11AM +0400, Vladimir V. Saveliev wrote:
 On Friday 06 October 2006 00:34, Louis-David Mitterrand wrote:
 On a 200-user mail server with a 500gb reiser3 fs, quotacheck takes an 
 hour at boot time. This is a mail server with 200 users. 

 which linux version is in use on the server?
 
 Debian unstable with latest kernel 2.6.17.
 
 Is that normal? Is there a way to speed it up?

 How do users store their mails? If they store one mail in a separate 
 file, then quotacheck is to iterate over
 a lot files which can be very time consuming.
 
 We use maildir, so, yes, it's a lot of files.
 
 Isn't there a way to run quotacheck in the background while daemons 
 start serving users? 
 
 Or must it absolutely be run at mount time to be effective?

You may run quotacheck at any time if you don't mind slightly incorrect
qotas as changes done on already checked files while qotacheck is
running wouldn't be noticed. You may skip quotacheck at startup at all -
if your server didn't crash quotas should stay in sync. So you could use
some off-peak time for quotacheck at runtime and do this once in a while.

-- 
Konstantin Münning


Re: BitTorrent+Reiser4: curiouser and curiouser

2006-09-22 Thread Konstantin Münning
David Masover wrote:

(snip)
 It shouldn't be touching the disk AT ALL when there's over a gig of FREE
 RAM (as in, neither buffer nor cache nor actually used yet), and the
 file I'm attempting to download is less than 200 megs.  I tried an
 strace, but as I am not at all skilled in the ways of debugging or
 reverse engineering, I got syscall spam -- a 200 meg log file, and when
 I finally found a decent way to analyze it, I found most of Azureus'
 system call wall time is spent in futex().  Huh?
 
 Looked up futex on Wikipedia, and I still have no clue how this makes
 any sense.  Either futex was somehow thrashing the disk, or Azureus has
 somehow managed to fork completely out of strace's control.  Or maybe
 it's somehow something that the kernel is doing on its own, which is
 somehow forcing azureus to block, but somehow not tripping strace's
 timers while doing so.

Have you used -f or -ff with strace? Without it you would see only the
initial process and not the forked processes. Having the futex call
indicates that there should be child processes, so -f or -ff is a must.

Just my 2 cents.
-- 
Konstantin Münning


Re: Bug report: reiserfsck --rebuild-tree not progressing

2006-04-11 Thread Konstantin Münning
Hi!

I had the same problem about a year ago with a 0.8TB drive, you may
check some list archives for the details.

The solution was a patch to the reiserfsprogs which was then
incorporated in version 3.6.19. I am not familiar with the details as I
only supplied the information and Vladimir did the work but I can only
repeat what was said - abort current fsck and retry with latest tools
(3.6.19 should be sufficent, I can't tell about 3.6.20).

As for your concerns, it's correct that when you abort the current fsck
it will result in an unmountable FS but you can repair it with another
run. I doubt there is any way to make the current fsck to contnue except
maybe some weird magic hack into the running program ;-). If there are
chances to loose data in this process - I'm not the expert but I think
an abort is not that critical - at least at that stage where reiserfsck
is in an endless loop. With my problem I had some minor data corruption
issues afterwards but I think that was because of the primary fault -
the RAID controller had RAM errors and the RAID consistency was broken
which resulted in the corrupted FS. Then after several fsck tries,
superblock reconstruction etc. I was surprised how much was still intact
(far less than 1% of data was corrupted) but I think this doesn't apply
to you as it's not any guarantee.

Have a nice day,
Konstantin

Tyler Phelps wrote:
 My questions (and all of the diagnostic information provided) revolve 
 around a specific process of reiserfsck, using the version that I 
 specified, which is still running.  The only way that I can try a new 
 version is to abort the current fsck operation... doing that 
 essentially invalidates all of the questions that I've asked.
 
 I'm reluctant to abort the current process.  My primary reason for  this
 is that I have no way of knowing if aborting the current fsck  will
 cause further damage.  After all, the man page states, Once  reiserfsck
 --rebuild-tree is started it must finish its work (and you  should not
 interrupt it), otherwise the filesystem will be left in  the unmountable
 state to avoid subsequent data corruptions.  Second,  I don't know if
 anything is even wrong with the way that things are  progressing with
 the current process... hence the reason for my  original questions.
 
 -Tyler
 
 On Apr 11, 2006, at 12:21 AM, Sander wrote:
 
 Tyler Phelps wrote (ao):

 Package: reiserfsprogs
 Version: 1:3.6.17-2


 Can you try a newer version?
 ftp://ftp.namesys.com/pub/reiserfsprogs/reiserfsprogs-3.6.19.tar.gz

 According to http://marc.theaimsgroup.com/?t=11423516088r=1w=2
 3.6.20 also exist, but I cant find it.

 Good luck, Sander

 -- 
 Humilis IT Services and Solutions
 http://www.humilis.net


Re: Help tracking down files on a bad block: understanding the output of debugreiserfs

2005-11-22 Thread Konstantin Münning
Hi!

Unfortunately I can't help you with the debugreiserfs output but maybe
with another approach for finding the correct blocknum of the bad
sector(s). Why don't you try

badblocks -b 4096 /dev/hda5

I'm not telling that your approach is wrong but this way (assuming your
reiserfs block is the default 4k) you'll be sure to have all bad blocks.
But it takes some time. On the other hand you may try this:

dd of=/dev/null if=/dev/hda5 bs=4k count=1 skip=xxx

with xxx the block number(s) of your calculation or output from
badblocks. Just to check if the block you've found is really unreadable.

How to identify the inode number corresponding to that block I can't
tell but finding the file etc. to the inode may be done with

find /mount-point-of-fs -inum xxx

where xxx is the inode number in question. See man find.

Hope that helps.
Konstantin

Dewey Sasser wrote:
 Hello all,
 
 I've recently received a bad block notice from SMARTD and I'm trying to
 track down which files might be affected.  I *think* I have the basic
 process correct but I'm missing the final step -- how to get the inode
 of the affected files.  My skill with Google and man pages got me to
 where I am but seems unequal to my remaining task.
 
 My question is:  is the inode of the file in question found in the
 output of debugreiserfs -1 blocknum or is there some other way to get
 the inode?
 
 Attached is a console log documenting the process I'm using so far.  How
 should I interpret this output from debugreiserfs?
 
 Many thanks,
 
 -- 
 Dewey Sasser
 
 
straits messages # grep LBAsect messages.1 | tail -n 1
Nov  9 09:45:34 straits hda: dma_intr: error=0x40 {
UncorrectableError }, LBAsect=32378944, sector=28250176
straits messages # fdisk -ul /dev/hda
 
Disk /dev/hda: 122.9 GB, 122942324736 bytes
255 heads, 63 sectors/track, 14946 cylinders, total 240121728 sectors
Units = sectors of 1 * 512 = 512 bytes
 
   Device Boot  Start End  Blocks   Id  System
/dev/hda1   *  63  128519   64228+  83  Linux
/dev/hda2  128520 4128704 292+  83  Linux
/dev/hda3 4128705   240107489   117989392+   5  Extended
/dev/hda5 41287684319878419535008+  83  Linux
/dev/hda643198848   24010748998454321   83  Linux
straits messages # dc
32378944 /# LBA Sector/
4128768   /# Starting sector for hda5/
-
p
28250176 /# result:  this is the sector within hda5/
8   /# convert 512 byte sectors to 4096 byte
ReiserFS blocks
/
/p
3531272   # /result:  this is the ReiserFS block containing the
bad sector/
q
straits messages # debugreiserfs -1 3531272 /dev/hda5
debugreiserfs 3.6.19 (2003 www.namesys.com)
 
3531272 is used in ondisk bitmap
 
===
LEAF NODE (3531272) contains level=1, nr_items=12, free_space=632
rdkey (real items 12)
   
 ---
 
|###|type|ilen|f/sp| loc|fmt|fsck|
 key  |
|   |||e/cn||
 |need||
   
 ---
 
|  0|3065513 2078685 0x0 SD (0), len 44, location 4052 entry count
65535, fsck need 0, format new|
(NEW SD), mode -rw-rw-r--, size 0, nlink 1, mtime 24/2005 04:17:29
blocks 0, uid 0
   
 ---
 
|  1|3065513 2078685 0x1 DRCT (2), len 456, location 3596 entry
count 65535, fsck need 0, format new|
   
 ---
 
|  2|3065513 2269255 0x0 SD (0), len 44, location 3552 entry count
65535, fsck need 0, format new|
(NEW SD), mode -rw-rw-r--, size 610, nlink 1, mtime 02/2005 04:05:49
blocks 8, uid 0
   
 ---
 
|  3|3065513 2269255 0x1 DRCT (2), len 616, location 2936 entry
count 65535, fsck need 0, format new|
   
 ---
 
|  4|3065513 2269421 0x0 SD (0), len 44, location 2892 entry count
65535, fsck need 0, format new|
(NEW SD), mode -rw-rw-r--, size 505, nlink 1, mtime 01/2005 14:35:40
blocks 8, uid 0
   
 ---
 
|  5|3065513 2269421 0x1 DRCT (2), len 512, location 2380 entry
count 65535, fsck need 0, format new|
   
 ---
 
|  6|3065513 2269440 0x0 SD (0), len 44, location 2336 entry count
65535, fsck need 0, format new|
(NEW SD), mode -rw-rw-r--, size 371, nlink 1, mtime 21/2005 14:35:50

Re: Reiser3 bug in 2.6.11.11

2005-11-17 Thread Konstantin Münning
Hi!

[EMAIL PROTECTED] wrote:
 Konstantin Münning schrieb:
 
 init started and first system startup messages appeared. But then a
 bunch of oopses appeared fast so I was not able to find which part of
 the kernel was causing the first error and then the keyboard stops
 responding so I couldn't scrollback. At that point only powering down
 was possible. I can't tell if it happens while fs was RO or when/after
 it was remounted RW. And as there was no network or disk access at that
 point recovering some information was not possible.
 
 
 can you reproduce the oopses and redirect the errors/oops message to a
 serial console or netconsole? does the error go away with a current
 kernel? perhaps some reiserfs guru can decode them

Sorry I had no time to play as it is an in-production laptop and had
to be functional fast so I currently have no way to reproduce the fault.
 Redirecting output would require another kernel and the device has no
serial or network port :-(.

Maybe extracting metadata (debugreiserfs -p) would have been good for
debugging but at that point I had nowhere to store it. If this happens a
second time I will find a way to save it.

As there were only a few fs errors reported by fsck probably looking
over range/overflow checks in the code regarding used space/block size
might give a hint...

Have a nice day,
-- 
Konstantin Münning


Reiser3 bug in 2.6.11.11

2005-11-14 Thread Konstantin Münning
Hello!

A few days ago I encountered a reiser3 bug in vanilla kernel 2.6.11.11.
I have no idea if it has been fixed in a recent kernel but here some
info if somebody is interested.

Short: some values seem to be untested and a corrupted fs generates
kernel oopses.

Details: on a laptop something caused a fs corruption (probably in
connection with swsusp but that's only a guess as I got it a few days
later) which caused it to oops/panic/hang shortly after first accesses
to the disk. Grub seems to have no problems and initial access was OK as
init started and first system startup messages appeared. But then a
bunch of oopses appeared fast so I was not able to find which part of
the kernel was causing the first error and then the keyboard stops
responding so I couldn't scrollback. At that point only powering down
was possible. I can't tell if it happens while fs was RO or when/after
it was remounted RW. And as there was no network or disk access at that
point recovering some information was not possible.

But maybe the log files of reiserfsck can help identify the culprit
(could it be something with the blocksize messages?):

reiserfsck:
-
bad_path: block 8435, pointer 11: The used space (3888) of the child
block (32773) is not equal to the (blocksize (4096) - free space (224) -
header size (24))
bad_path: block 2283225, pointer 29: The used space (4072) of the child
block (6160385) is not equal to the (blocksize (4096) - free space (180)
- header size (24))
block 1049101: The number of items (59) is incorrect, should be (57)
 the problem in the internal node occured (1049101), whole subtree is
skipped
bad_path: block 3145901, pointer 40: The used space (2432) of the child
block (557840) is not equal to the (blocksize (4096) - free space (1740)
- header size (24))
vpf-10640: The on-disk and the correct bitmaps differs.
-

reiserfsck -rebuild-tree:
-
### Pass 0 ###
block 1049101: The number of items (59) is incorrect, should be (57) -
corrected
block 1049101: The free space (65504) is incorrect, should be (68) -
corrected
block 1545017: The number of items (2) is incorrect, should be (0) -
corrected
block 1545017: The free space (43432) is incorrect, should be (4072) -
corrected
block 4131356: The number of items (7) is incorrect, should be (0) -
corrected
block 4131356: The free space (0) is incorrect, should be (4072) - corrected
508677 directory entries were hashed with r5 hash.
### Pass 1 ###
### Pass 2 ###
### Pass 3 #
vpf-10650: The directory [2 5300] has the wrong size in the StatData
(5544) - corrected to (5504)
vpf-10680: The file [397629 106971] has the wrong block count in the
StatData (0) - corrected to (8)
rebuild_semantic_pass: The entry [397629 111711] (xinetd.pid) in
directory [397629 403911] points to nowhere - is removed
vpf-10680: The file [397629 111702] has the wrong block count in the
StatData (8) - corrected to (0)
vpf-10650: The directory [397629 403911] has the wrong size in the
StatData (432) - corrected to (400)
vpf-10650: The directory [102361 1849502] has the wrong size in the
StatData (840) - corrected to (808)
### Pass 3a (lost+found pass) #
-

As you can see, it seems to be a tiny corruption but with devastating
results ;-). No data seemed to be lost after rebuild-tree.

Have a nice day,
-- 
Konstantin Münning


Re: My Dad suggests a redundant copies plugin

2005-10-25 Thread Konstantin Münning
Sander wrote:
 Hans Reiser wrote (ao):
 
It is only for very important files for computers which have only one
hard drive. Some of the work is with changing fsck.
 
 
 Well, if the files are important, then you should have backups anyway,
 whatever raid or similar you have. I still don't see an advantage in
 having two versions of a file on one disk.

Here is one: something bad happens with your FS so you need to fsck,
rebuild-tree (or what the corresponding thing for reiser4 is) or
something like this. Having important files duplicated improves the
chance that at least one copy is still intact afterwards as depending on
the grade of corruption rebuilt fs may show the same files but contents
may differ. (verifying which copy is OK is not in the scope of this writing)

Yes, having backups is better but that wouldn't help if it's your
laptop, the backups are 2000 miles away in your office and you have only
some Linux Boot CD?! ;-)

I know, that doesn't happen often but I've had similar situations. As
I've allways managed to help me somehow missing this feature wouldn't
make me cry but I would probably use it. Just my two cents.

 But then again, that should not hinder anyone :-)



late delete improving undeleting files on reiser4

2005-10-25 Thread Konstantin Münning
Hi!

The discussion abot loosing important files made me think about a
fetaure I met and liked on DR-DOS (later Novell DOS but I'm not sure if
they kept this feature) which may be nice in Reiser4 and would probably
need an additional plugin.

I'll call it late delete as I can't remember how they called it on
DR-DOS. As I'm not aware of all current features of Reiser4 (not using
it yet) please excuse me if this is already implemented or planned
somewhere.

So when deleting files, instead of directly removing the entries off the
directories and freeing the used blocks, they could be only marked for
deletion so undeleting would be easy. That's trivial so far. But the
nice (and tricky) feature would be this to be transparent in means that
when disk space gets used up such marked files would be actually deleted
and the user wouldn't notice a thing (except eventually some overhead
(badly implemented on DR-DOS) but to that later).

As simple as it may sound here it needs much more thinking how to do it
good. Here some things that come to my mind about it:

Disadvantages:
- overhead when disk space has to be reclaimed (late delete)
- increasing fragmentation as mark-deleted file space increases
effective disk usage %
- security issues when having bad defaults and/or is poorly configurable
- may need special undelete tool

Advantages:
- undeletes files fast, reliable and easy on mounted fs
- mark-delete is fast (see first disadvantage, so it's maybe only
using the needed time later)
- it is transparent up to the point of configuration and actual undelete
- improving media life on medias with limited write cycles (Flash Cards,
...) as data writes would cycle through all of the free space of the media

Improvements:
- having an attribute defining which file is to be mark-deleted and
which regularly deleted
- having a directory attribute for this feature in addition of
per-file-attribute as a default for new files
- defining a reclaim-strategy as delete biggest/smallest files first,
delete oldest mark-deleted file first etc.
- having a keep age attribute to automatically delete mark-deleted
files after some time
- having a deleted copies attribute to define/limit how many copies of
the same deleted file has to be kept
- defining a max. fillup percentage for startig to reclaim space. By
this the overhead of reclaiming may be moved to a moment of low disk
usage (except when there is no space left) and the first disadvantage
would be mostly gone.
- tool to manually purge/clean some mark-deleted files (specific
ones or in a directory/tree)

Other improvements in connection with other plugins:
- compressing mark-deleted files
- having a wipe file attribute for security which wipes the data
portion of the deleted file (with zeroes, random, ...) before freeing
the blocks (evtl. in combination with the keep age attribute above)

As for the implementation the mark-deleted files/directories may be
moved to a (hidden) .deleted directory on the fs-root. Dependant on
implementation this may eventually loose the original location of the
file and could disclose otherwise protected files but would reduce the
overhead of the delete oldest mark-deleted file first strategy. Of
course having this directory as a kind of metadata listing of the files
and keeping them where they are would do the same but needs more coding.

So far my thoughts. Comments welcome.
-- 
Konstantin Münning


reiserfstune: block allocator is not defined

2005-10-20 Thread Konstantin Münning
Hi!

Just tried to add some badblocks like this:

reiserfstune -b /tmp/badblocklist /dev/hda5

and I get the error:


block allocator is not defined

Aborted


The same when I try it with -B. As this message means nothing to me, has
somebody any idea what the problem might be? The man pages and
interestingly a web search gave me nothing about it. Or is reiserfstune
not to be used for adding bad blocks to the fs?

Kernel 2.6.12, reiserfstools 3.6.19.

Thanks!
-- 
Konstantin Münning


Re: mkreiserfs: Meaning of fileysteme-size arg

2005-09-29 Thread Konstantin Münning
Hi!

[EMAIL PROTECTED] wrote:
 Hi,
 
 what is the EXACT meaning of the filesystem-size command line argument to
 mkreiserfs?

according to the man page:

filesystem-size is the size in blocks of the filesystem. If omitted,
mkreiserfs will automatically set it.

 Is it the total size the filesystem occupies, i.e. will the filesystem fit
 on a partition exactly the size in filesystem-size, or is it some sort of
 net value, to which I have to add some overhead.

In my understanding and experience it is the size you want to occupy
with the fs. This would be the size of the partition you want the fs to
fit in. If you don't specify it mkreiserfs will query the size itself.

 Background:
 I want to create a filesystem on a disk partition that can be backed up to
 a dvd+r just by dd-ing an image of the filesystem onto the dvd+r-writer,
 and I'd like to make the filesystem exactly the size the dvd+r can hold.
 Unfortunately, this exact size cannot be expressed by my hard disk geometry
 (i.e. i cannot make a partition exactly that size), so the partition has to
 be a bit bigger than the filesystem. Can I create a reiserfs that is
 smaller than the partition it is contained in, with the filesystem-size
 argument,, and will this work reliably?

I assume you want to make it like this to be able to mount the DVD later
for accessing the backup files directly (no untarring).

The idea is nice but maybe there are better ways. For the reliability -
if you can umount (or remount ro) the partition before backing it up it
would work reliable. Otherwise you will be saving an unclean partition
with possible corruptions as journal and fs contents may change
independently while you are storing.

Why using reiserfs on a read only medium? The only sense I see is if you
want to keep the positions of the data on the medium after restoring by
dd-ing it back to your partition. Some software copy protections need
this. Otherwise you would be wasting media for the journal. Creating a
new fs and restoring the files is only a little bit more effort but you
get allways a clean fs.

If your concern is to store files 2GB which isofs can't, use UDF. The
dvd+rw-tools can do this on the fly so doing this is as easy as dd-ing
the partition to the DVD.

Do you need to have a partition for your fs? Arbitrary sizes are
possible if you create an image file and store your fs there. You can
mount it using the loopback device. Otherwise if you use a partition
bigger than the fs you need to specify fs size on the dd command as
otherwise dd will transfer the size of the partition, not the size of
the (smaller) fs.

I hope I could help.
Konstantin


Re: we have got hash function screwed up

2005-09-05 Thread Konstantin Münning
Hi!

Gabor HALASZ wrote:

 [EMAIL PROTECTED]:~# touch
 /home/ftpd/pub/debian/pool/main/x/xorg-x11/.in.xserver-xorg_6.8.2.dfsg.1-6_i386.deb
 
 touch: cannot touch
 `/home/ftpd/pub/debian/pool/main/x/xorg-x11/.in.xserver-xorg_6.8.2.dfsg.1-6_i386.deb':
 Device or resource busy

Errors like these I've seen mostly with corrupted FS. Umount the
partition (reboot from some live-CD if it's your root partition) and do
an fsck.reiserfs /dev/xxx. I'm quite sure it will find something.

Before trying repairs check if your disk is operational (at least
badblocks -s /dev/xxx) as otherwise things can get worse. And don't
forget to backup as much of your important data as possible - it is very
likely but there's no guarantee that it will survive a repair.

If fsck don't show anything then someone better informed should help you.

Have luck,
Konstantin


Re: Strange problems/bugs with reiserfs and reiserfschk

2005-08-23 Thread Konstantin Münning
Hi Vitaly!

Thank you for the reiserfsck 3.9.20. It in fact had different results on
 that drive. I had it run in gdb (as I did with 3.6.19 to see what/where
the trouble may be) and the result is:

(***snip***)
vpf-10680: The file [641222 641239] has the wrong block count in the
StatData (1528) - corrected to (1520)
vpf-10680: The file [641222 641241] has the wrong block count in the
StatData (47192) - corrected to (47168)
vpf-10680: The file [641222 641242] has the wrong block count in the
StatData (16624) - corrected to (16528)

are_file_items_correct: All bytes we look for must be first items byte
(position 0).

Program received signal SIGABRT, Aborted.
0xe410 in __kernel_vsyscall ()
(***snip***)

Hmm... Ugly ;-).

Vitaly Fertman wrote:

 if some file item offsets are corrupted, fsck can work for too long on 
 pass2. or it also can be a bug. I will send you a version of reiserfsprogs 
 that has an optimization fix for former. if it fails email me and provide 
 the metadata please:
   debugreiserfs -p device | bzip2 -c  device.bz2

Do you need the metadata or the full logfile or should I send you
something more/else for that?

Thanks and have a nice day,
Konstantin


Re: Strange problems/bugs with reiserfs and reiserfschk

2005-08-13 Thread Konstantin Münning
Hi Everyone.

OK, there seems definitely to be some kind of bug in reiserfsck 3.6.19.
Or is it a feature? ;-)

I tried once again with reiserfsck --rebuild-tree to repair the FS and
here it is again. About the end of pass 2 (about 20h after starting)
counting stopped at left 32022, 500 /sec but there was heavy acccess
of the drive. After about an hour reiserfsck started consuming 100% CPU
and is doning some minimal access to the drive (the drive light blinks
every second or so, SCSI reports about 40 commands for each of these
accesses).

What could be causing this? Is the drive too large for reiserfsck? I
wouldn't believe that 0,6TB are but it is consuming at least quite a lot
of memory:

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
15185 root  39  19 76304  51m  688 R 99.9 10.3   4591:30 reiserfsck

I have left it like this the last 3 days just to make sure that it's not
my lack of patience. But now still... So any advices? How to find out
what is causing reiserfsck to hang? Or would I have to build a debug
version and check for myself?

The drive itself is working, I checked several times, the server is
working all the time as well. Here some reiserfs output it it helps
somebody to have an idea:

(***snip***)
block 125371211: The number of items (1) is incorrect, should be (0) -
corrected
block 125371211: The free space (0) is incorrect, should be (4072) -
corrected
block 125506454: The number of items (1) is incorrect, should be (0) -
corrected
block 125506454: The free space (0) is incorrect, should be (4072) -
corrected
pass0: vpf-10160: block 129485584: item 2: No . entry found in the
first item
of a directory
pass0: vpf-10160: block 129485584: item 4: No . entry found in the
first item
of a directory
pass0: vpf-10160: block 133169584: item 14: No . entry found in the
first item
 of a directory
pass0: vpf-10160: block 133169648: item 25: No . entry found in the
first item
 of a directory
pass0: vpf-10160: block 133170064: item 10: No . entry found in the
first item
 of a directory
pass0: vpf-10160: block 134316400: item 18: No . entry found in the
first item
 of a directory
pass0: vpf-10560: block 145031172, item 7: Wrong order of items - change
the obj
ect_id of the key [2237738 2237741 0x1 DRCT (2)] to 2237740
pass0: vpf-10160: block 145817616: item 1: No . entry found in the
first item
of a directory
  left 0, 2623 /sec
914346 directory entries were hashed with r5 hash.
r5 hash is selected
Flushing..finished
Read blocks (but not data blocks) 94209553
Leaves among those 3522616
- corrected leaves 319
- leaves all contents of which could not be
saved and de
leted 15
pointers in indirect items to wrong area 24 (zeroed)
Objectids found 984719

Pass 1 (will try to insert 3522601 leaves):
### Pass 1 ###
Looking for allocable blocks .. finished
0%20%40%is_leaf_bad: block 35452246, item 25: The corrupted
item fou
nd (878203 878203 0x0 SD (0), len 44, location 3056 entry count 65535,
fsck need
 1, format new)
is_leaf_bad: block 35452246, item 26: The corrupted item found (878203
878203 0x
1 DRCT (2), len 1464, location 1592 entry count 65535, fsck need 1,
format new)
is_leaf_bad: WARNING: The leaf (35452246) is formatted badly. Will be
handled on
 the the pass2.
60%80%100% left 0, 701 /sec
Flushing..finished
3522601 leaves read
3489821 inserted
32780 not inserted
non-unique pointers in indirect items (zeroed) 656
### Pass 2 ###

Pass 2:
0%20%40%..rewrite_file: 2 items of file [2340286 2340312] moved
to [2340
286 16]
vpf-10260: The file we are inserting the new item (432679 432760 0xf001
IND (1),
 len 160, location 3936 entry count 0, fsck need 1, format new) into has
no Stat
Data, insertion was skipped
vpf-10260: The file we are inserting the new item (432679 432828 0x2a001
IND (1)
, len 132, location 3964 entry count 0, fsck need 1, format new) into
has no Sta
tData, insertion was skipped
(***snip***)
vpf-10260: The file we are inserting the new item (526132 526780 0x1 IND
(1), len 8, location 4088 entry count 0, fsck need 1, format new) into
has no StatData, insertion was skipped
vpf-10260: The file we are inserting the new item (557044 557073 0x1 IND
(1), len 4, location 4092 entry count 0, fsck need 3, format new) into
has no StatData, insertion was skipped
vpf-10260: The file we are inserting the new item (558483 558492 0x20001
IND (1), len 96, location 4000 entry count 0, fsck need 1, format new)
into has no StatData, insertion was skipped
vpf-10260: The file we are inserting the new item (759159 759160 0x1 IND
(1), len 3208, location 888 entry count 0, fsck need 1, format new) into
has no StatData, insertion was skipped
 left 32022, 500 /sec



Re: Strange problems/bugs with reiserfs and reiserfschk

2005-08-08 Thread Konstantin Münning
Hi.

This processor produces much heat but this is only a question of how you
cool it. The system has no troubles with heat and stability. It's a
server which is constantly running and except the mentioned problem
there are no other troubles. I can compile things for hours on that
system so memory and/or heat shouldn't be the problem. When doing disk
access the cpu is mostly idle. Working with lots of files on several
drives had not produced any problems. There are no recent hardware or
software upgrades which coincide with the bug. The only thing which
coincide is the FS corruption. The other drives are still fine.

michael chang wrote:
 On 8/7/05, Konstantin Münning [EMAIL PROTECTED] wrote:
 
There seems to be something I would call a bud in ReiserFS at least in
kernel 2.6.11.11 which can cause the system/computer to freeze. It is
caused by a corruption of the FS but at that point I expected to have
 
Kernel 2.6.11.11, Gentoo-Linux, SMP (HyperThreading P4, 3GHz)
 
 snip
 
 If memory serves me right, any 3GHz processor will get very hot, very
 fast.  Is it possible that it got hot and started messing up data? 
 Maybe consider downclocking your cpu or using CPUFreq (or similar) and
 see if there isn't data loss running at e.g. 2.5 or 1.5 GHz.  Either
 that, or don't run it for more than a few hours at a time.  I'm pretty
 sure 2.6.11.11 has CPUFreq in it somewheres.  Something to look at in
 the future.
 
 Of course, this is all speculation.  I have absolutely no idea.


Strange problems/bugs with reiserfs and reiserfschk

2005-08-07 Thread Konstantin Münning
Hi Folks!

There seems to be something I would call a bud in ReiserFS at least in
kernel 2.6.11.11 which can cause the system/computer to freeze. It is
caused by a corruption of the FS but at that point I expected to have
some inaccessable files which I already know from FS corruptions but not
to hang the system. If someone thinks it's worth investigating, please
read further.

Yes, I know that working with a corrupt FS is nothing good but my
intention was simply to save as much of the files as possible before
doing a rebuild-tree just in case it's all gone after that. As I said,
my experience with corrupt ReiserFS was good with the knowledge that
some files/direcories would be incaccessible. But this time the System
was rendered unuseable when accessing certain directories - no more
mount/umount or even sync were possible (they simply did not return) so
there was no way to shutdown the machine. IMHO this should be considered
as a severe bug - refusing to read a corrupt portions of a FS is OK but
rendering the system unuseable is bad.

I'm wondering what kind of information I can provide so the source of
this can be found. Here some but if you want more, please tell me:

Kernel 2.6.11.11, Gentoo-Linux, SMP (HyperThreading P4, 3GHz)

Here some portions of /var/log/messages which may show what's about:
 the messages just before the system got unuseable:

**
Aug  5 22:17:22 master ReiserFS: sdc1: warning: vs-5150: search_by_key:
invalid format found in block 27594920. Fsck?
Aug  5 22:17:22 master ReiserFS: warning: is_tree_node: node level 57345
does not match to the expected one 1
(snip)
Aug  5 22:17:22 master ReiserFS: sdc1: warning: vs-5150: search_by_key:
invalid format found in block 27594920. Fsck?
Aug  5 22:17:22 master ReiserFS: warning: is_tree_node: node level 57345
does not match to the expected one 1
Aug  5 22:17:22 master ReiserFS: sdc1: warning: vs-5150: search_by_key:
invalid format found in block 27594920. Fsck?
Aug  5 22:17:22 master ReiserFS: warning: is_tree_node: node level 57345
does not match tond in block 27594920. Fsck?
Aug  5 22:17:22 master ReiserFS: warmatch to the expected one 1
Aug  5 22:17:22 master unparseable log message: nd in block 27594920.
Fsck?
Aug  5 22:17:22 master ReiserFS: warning: is_tree_node: node levematch
to the expected ond in block 27594920. Fsck?
Aug  5 22:17:22 master ReiserFS: warning: is_tree_node: nmatch to the
expected one 1
Aug  5 22:17:22 master unparseable log message: nd in block 27594920.
Fsck?
Aug  5 22:17:22 master ReiserFS: warning: is_tree_node: node lmatch to
the expected ond in block 27594920. Fsck?
Aug  5 22:17:22 master ReiserFS: warning: is_tree_node: node match to
the expected ond in block 27594920. Fsck?
(snip)
Aug  5 22:17:26 master ReiserFS: warning:nd in block 27608085. Fsck?
Aug  5 22:17:26 master ReiserFS: warninmatch to the expected onend in
block 2760 nd in block 27608085. Fsck?
Aug  5 22:17:26 master ReiserFS: warning: is_tree_node: nodematch to the
expected nd in block 27608085. Fsck?
(snip)
Aug  5 22:18:03 master ReiserFS: sdc1: warning: vs-5150: search_by_key:
invalid
format found in block 122418791. Fsck?
Aug  5 22:18:03 master ReiserFS: warning: is_tree_node: node level 18499
does no
t match to the expected one 1
Aug  5 22:18:03 master ReiserFS: sdc1: warning: vs-5150: search_by_key:
invalid
format found in block 122418791. Fsck?
Aug  5 22:18:03 master init_special_inode: bogus i_mode (17)
Aug  5 22:18:03 master ReiserFS: warning: is_tree_node: node level 65471
does no
t match to the expected one 1
Aug  5 22:18:03 master ReiserFS: sdc1: warning: vs-5150: search_by_key:
invalid
format found in block 82406293. Fsck?
**

The interesting point is that the messages are getting weird at some
point - see portion after the first (snip). As if something is
overwriting an internal buffer or something. Maye caused by high
frequency of messages or some race condition between processors? I have
no idea if this is an indication of the suspected bug but that seems
likely to me. The last portion are the last messages just before the
next boot of the computer. Just if you ask - CPU/Memory of that server
are fine as long as Memtest86(+) can tell. So, what's next?

Now to the second part. After giving up to save more data (well, I saved
the important 30% of these 400GB) I started a reiserfsck --rebuild-tree.
It worked quite good until about the end. There it seems to be frozen
and consumes 100% CPU. Here some data:

reiserfsprogs-3.6.19, messages of reiserfsck:

**
.pass1: block 145817616, item 1, entry 0: The entry .. of the [259961
259992 0x2 DIR (3)] is hashed with not set whereas proper hash is r5 -
deleted
100% left 0, 212 /sec
Flushing..finished
268526 leaves read
203192 inserted
- pointers in indirect items pointing to
metadata 1890 (zeroed)
65334 not inserted
non-unique