Re: is quotacheck slow with reiserfs
Louis-David Mitterrand wrote: On Fri, Oct 06, 2006 at 02:09:11AM +0400, Vladimir V. Saveliev wrote: On Friday 06 October 2006 00:34, Louis-David Mitterrand wrote: On a 200-user mail server with a 500gb reiser3 fs, quotacheck takes an hour at boot time. This is a mail server with 200 users. which linux version is in use on the server? Debian unstable with latest kernel 2.6.17. Is that normal? Is there a way to speed it up? How do users store their mails? If they store one mail in a separate file, then quotacheck is to iterate over a lot files which can be very time consuming. We use maildir, so, yes, it's a lot of files. Isn't there a way to run quotacheck in the background while daemons start serving users? Or must it absolutely be run at mount time to be effective? You may run quotacheck at any time if you don't mind slightly incorrect qotas as changes done on already checked files while qotacheck is running wouldn't be noticed. You may skip quotacheck at startup at all - if your server didn't crash quotas should stay in sync. So you could use some off-peak time for quotacheck at runtime and do this once in a while. -- Konstantin Münning
Re: BitTorrent+Reiser4: curiouser and curiouser
David Masover wrote: (snip) It shouldn't be touching the disk AT ALL when there's over a gig of FREE RAM (as in, neither buffer nor cache nor actually used yet), and the file I'm attempting to download is less than 200 megs. I tried an strace, but as I am not at all skilled in the ways of debugging or reverse engineering, I got syscall spam -- a 200 meg log file, and when I finally found a decent way to analyze it, I found most of Azureus' system call wall time is spent in futex(). Huh? Looked up futex on Wikipedia, and I still have no clue how this makes any sense. Either futex was somehow thrashing the disk, or Azureus has somehow managed to fork completely out of strace's control. Or maybe it's somehow something that the kernel is doing on its own, which is somehow forcing azureus to block, but somehow not tripping strace's timers while doing so. Have you used -f or -ff with strace? Without it you would see only the initial process and not the forked processes. Having the futex call indicates that there should be child processes, so -f or -ff is a must. Just my 2 cents. -- Konstantin Münning
Re: Bug report: reiserfsck --rebuild-tree not progressing
Hi! I had the same problem about a year ago with a 0.8TB drive, you may check some list archives for the details. The solution was a patch to the reiserfsprogs which was then incorporated in version 3.6.19. I am not familiar with the details as I only supplied the information and Vladimir did the work but I can only repeat what was said - abort current fsck and retry with latest tools (3.6.19 should be sufficent, I can't tell about 3.6.20). As for your concerns, it's correct that when you abort the current fsck it will result in an unmountable FS but you can repair it with another run. I doubt there is any way to make the current fsck to contnue except maybe some weird magic hack into the running program ;-). If there are chances to loose data in this process - I'm not the expert but I think an abort is not that critical - at least at that stage where reiserfsck is in an endless loop. With my problem I had some minor data corruption issues afterwards but I think that was because of the primary fault - the RAID controller had RAM errors and the RAID consistency was broken which resulted in the corrupted FS. Then after several fsck tries, superblock reconstruction etc. I was surprised how much was still intact (far less than 1% of data was corrupted) but I think this doesn't apply to you as it's not any guarantee. Have a nice day, Konstantin Tyler Phelps wrote: My questions (and all of the diagnostic information provided) revolve around a specific process of reiserfsck, using the version that I specified, which is still running. The only way that I can try a new version is to abort the current fsck operation... doing that essentially invalidates all of the questions that I've asked. I'm reluctant to abort the current process. My primary reason for this is that I have no way of knowing if aborting the current fsck will cause further damage. After all, the man page states, Once reiserfsck --rebuild-tree is started it must finish its work (and you should not interrupt it), otherwise the filesystem will be left in the unmountable state to avoid subsequent data corruptions. Second, I don't know if anything is even wrong with the way that things are progressing with the current process... hence the reason for my original questions. -Tyler On Apr 11, 2006, at 12:21 AM, Sander wrote: Tyler Phelps wrote (ao): Package: reiserfsprogs Version: 1:3.6.17-2 Can you try a newer version? ftp://ftp.namesys.com/pub/reiserfsprogs/reiserfsprogs-3.6.19.tar.gz According to http://marc.theaimsgroup.com/?t=11423516088r=1w=2 3.6.20 also exist, but I cant find it. Good luck, Sander -- Humilis IT Services and Solutions http://www.humilis.net
Re: Help tracking down files on a bad block: understanding the output of debugreiserfs
Hi! Unfortunately I can't help you with the debugreiserfs output but maybe with another approach for finding the correct blocknum of the bad sector(s). Why don't you try badblocks -b 4096 /dev/hda5 I'm not telling that your approach is wrong but this way (assuming your reiserfs block is the default 4k) you'll be sure to have all bad blocks. But it takes some time. On the other hand you may try this: dd of=/dev/null if=/dev/hda5 bs=4k count=1 skip=xxx with xxx the block number(s) of your calculation or output from badblocks. Just to check if the block you've found is really unreadable. How to identify the inode number corresponding to that block I can't tell but finding the file etc. to the inode may be done with find /mount-point-of-fs -inum xxx where xxx is the inode number in question. See man find. Hope that helps. Konstantin Dewey Sasser wrote: Hello all, I've recently received a bad block notice from SMARTD and I'm trying to track down which files might be affected. I *think* I have the basic process correct but I'm missing the final step -- how to get the inode of the affected files. My skill with Google and man pages got me to where I am but seems unequal to my remaining task. My question is: is the inode of the file in question found in the output of debugreiserfs -1 blocknum or is there some other way to get the inode? Attached is a console log documenting the process I'm using so far. How should I interpret this output from debugreiserfs? Many thanks, -- Dewey Sasser straits messages # grep LBAsect messages.1 | tail -n 1 Nov 9 09:45:34 straits hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=32378944, sector=28250176 straits messages # fdisk -ul /dev/hda Disk /dev/hda: 122.9 GB, 122942324736 bytes 255 heads, 63 sectors/track, 14946 cylinders, total 240121728 sectors Units = sectors of 1 * 512 = 512 bytes Device Boot Start End Blocks Id System /dev/hda1 * 63 128519 64228+ 83 Linux /dev/hda2 128520 4128704 292+ 83 Linux /dev/hda3 4128705 240107489 117989392+ 5 Extended /dev/hda5 41287684319878419535008+ 83 Linux /dev/hda643198848 24010748998454321 83 Linux straits messages # dc 32378944 /# LBA Sector/ 4128768 /# Starting sector for hda5/ - p 28250176 /# result: this is the sector within hda5/ 8 /# convert 512 byte sectors to 4096 byte ReiserFS blocks / /p 3531272 # /result: this is the ReiserFS block containing the bad sector/ q straits messages # debugreiserfs -1 3531272 /dev/hda5 debugreiserfs 3.6.19 (2003 www.namesys.com) 3531272 is used in ondisk bitmap === LEAF NODE (3531272) contains level=1, nr_items=12, free_space=632 rdkey (real items 12) --- |###|type|ilen|f/sp| loc|fmt|fsck| key | | |||e/cn|| |need|| --- | 0|3065513 2078685 0x0 SD (0), len 44, location 4052 entry count 65535, fsck need 0, format new| (NEW SD), mode -rw-rw-r--, size 0, nlink 1, mtime 24/2005 04:17:29 blocks 0, uid 0 --- | 1|3065513 2078685 0x1 DRCT (2), len 456, location 3596 entry count 65535, fsck need 0, format new| --- | 2|3065513 2269255 0x0 SD (0), len 44, location 3552 entry count 65535, fsck need 0, format new| (NEW SD), mode -rw-rw-r--, size 610, nlink 1, mtime 02/2005 04:05:49 blocks 8, uid 0 --- | 3|3065513 2269255 0x1 DRCT (2), len 616, location 2936 entry count 65535, fsck need 0, format new| --- | 4|3065513 2269421 0x0 SD (0), len 44, location 2892 entry count 65535, fsck need 0, format new| (NEW SD), mode -rw-rw-r--, size 505, nlink 1, mtime 01/2005 14:35:40 blocks 8, uid 0 --- | 5|3065513 2269421 0x1 DRCT (2), len 512, location 2380 entry count 65535, fsck need 0, format new| --- | 6|3065513 2269440 0x0 SD (0), len 44, location 2336 entry count 65535, fsck need 0, format new| (NEW SD), mode -rw-rw-r--, size 371, nlink 1, mtime 21/2005 14:35:50
Re: Reiser3 bug in 2.6.11.11
Hi! [EMAIL PROTECTED] wrote: Konstantin Münning schrieb: init started and first system startup messages appeared. But then a bunch of oopses appeared fast so I was not able to find which part of the kernel was causing the first error and then the keyboard stops responding so I couldn't scrollback. At that point only powering down was possible. I can't tell if it happens while fs was RO or when/after it was remounted RW. And as there was no network or disk access at that point recovering some information was not possible. can you reproduce the oopses and redirect the errors/oops message to a serial console or netconsole? does the error go away with a current kernel? perhaps some reiserfs guru can decode them Sorry I had no time to play as it is an in-production laptop and had to be functional fast so I currently have no way to reproduce the fault. Redirecting output would require another kernel and the device has no serial or network port :-(. Maybe extracting metadata (debugreiserfs -p) would have been good for debugging but at that point I had nowhere to store it. If this happens a second time I will find a way to save it. As there were only a few fs errors reported by fsck probably looking over range/overflow checks in the code regarding used space/block size might give a hint... Have a nice day, -- Konstantin Münning
Reiser3 bug in 2.6.11.11
Hello! A few days ago I encountered a reiser3 bug in vanilla kernel 2.6.11.11. I have no idea if it has been fixed in a recent kernel but here some info if somebody is interested. Short: some values seem to be untested and a corrupted fs generates kernel oopses. Details: on a laptop something caused a fs corruption (probably in connection with swsusp but that's only a guess as I got it a few days later) which caused it to oops/panic/hang shortly after first accesses to the disk. Grub seems to have no problems and initial access was OK as init started and first system startup messages appeared. But then a bunch of oopses appeared fast so I was not able to find which part of the kernel was causing the first error and then the keyboard stops responding so I couldn't scrollback. At that point only powering down was possible. I can't tell if it happens while fs was RO or when/after it was remounted RW. And as there was no network or disk access at that point recovering some information was not possible. But maybe the log files of reiserfsck can help identify the culprit (could it be something with the blocksize messages?): reiserfsck: - bad_path: block 8435, pointer 11: The used space (3888) of the child block (32773) is not equal to the (blocksize (4096) - free space (224) - header size (24)) bad_path: block 2283225, pointer 29: The used space (4072) of the child block (6160385) is not equal to the (blocksize (4096) - free space (180) - header size (24)) block 1049101: The number of items (59) is incorrect, should be (57) the problem in the internal node occured (1049101), whole subtree is skipped bad_path: block 3145901, pointer 40: The used space (2432) of the child block (557840) is not equal to the (blocksize (4096) - free space (1740) - header size (24)) vpf-10640: The on-disk and the correct bitmaps differs. - reiserfsck -rebuild-tree: - ### Pass 0 ### block 1049101: The number of items (59) is incorrect, should be (57) - corrected block 1049101: The free space (65504) is incorrect, should be (68) - corrected block 1545017: The number of items (2) is incorrect, should be (0) - corrected block 1545017: The free space (43432) is incorrect, should be (4072) - corrected block 4131356: The number of items (7) is incorrect, should be (0) - corrected block 4131356: The free space (0) is incorrect, should be (4072) - corrected 508677 directory entries were hashed with r5 hash. ### Pass 1 ### ### Pass 2 ### ### Pass 3 # vpf-10650: The directory [2 5300] has the wrong size in the StatData (5544) - corrected to (5504) vpf-10680: The file [397629 106971] has the wrong block count in the StatData (0) - corrected to (8) rebuild_semantic_pass: The entry [397629 111711] (xinetd.pid) in directory [397629 403911] points to nowhere - is removed vpf-10680: The file [397629 111702] has the wrong block count in the StatData (8) - corrected to (0) vpf-10650: The directory [397629 403911] has the wrong size in the StatData (432) - corrected to (400) vpf-10650: The directory [102361 1849502] has the wrong size in the StatData (840) - corrected to (808) ### Pass 3a (lost+found pass) # - As you can see, it seems to be a tiny corruption but with devastating results ;-). No data seemed to be lost after rebuild-tree. Have a nice day, -- Konstantin Münning
Re: My Dad suggests a redundant copies plugin
Sander wrote: Hans Reiser wrote (ao): It is only for very important files for computers which have only one hard drive. Some of the work is with changing fsck. Well, if the files are important, then you should have backups anyway, whatever raid or similar you have. I still don't see an advantage in having two versions of a file on one disk. Here is one: something bad happens with your FS so you need to fsck, rebuild-tree (or what the corresponding thing for reiser4 is) or something like this. Having important files duplicated improves the chance that at least one copy is still intact afterwards as depending on the grade of corruption rebuilt fs may show the same files but contents may differ. (verifying which copy is OK is not in the scope of this writing) Yes, having backups is better but that wouldn't help if it's your laptop, the backups are 2000 miles away in your office and you have only some Linux Boot CD?! ;-) I know, that doesn't happen often but I've had similar situations. As I've allways managed to help me somehow missing this feature wouldn't make me cry but I would probably use it. Just my two cents. But then again, that should not hinder anyone :-)
late delete improving undeleting files on reiser4
Hi! The discussion abot loosing important files made me think about a fetaure I met and liked on DR-DOS (later Novell DOS but I'm not sure if they kept this feature) which may be nice in Reiser4 and would probably need an additional plugin. I'll call it late delete as I can't remember how they called it on DR-DOS. As I'm not aware of all current features of Reiser4 (not using it yet) please excuse me if this is already implemented or planned somewhere. So when deleting files, instead of directly removing the entries off the directories and freeing the used blocks, they could be only marked for deletion so undeleting would be easy. That's trivial so far. But the nice (and tricky) feature would be this to be transparent in means that when disk space gets used up such marked files would be actually deleted and the user wouldn't notice a thing (except eventually some overhead (badly implemented on DR-DOS) but to that later). As simple as it may sound here it needs much more thinking how to do it good. Here some things that come to my mind about it: Disadvantages: - overhead when disk space has to be reclaimed (late delete) - increasing fragmentation as mark-deleted file space increases effective disk usage % - security issues when having bad defaults and/or is poorly configurable - may need special undelete tool Advantages: - undeletes files fast, reliable and easy on mounted fs - mark-delete is fast (see first disadvantage, so it's maybe only using the needed time later) - it is transparent up to the point of configuration and actual undelete - improving media life on medias with limited write cycles (Flash Cards, ...) as data writes would cycle through all of the free space of the media Improvements: - having an attribute defining which file is to be mark-deleted and which regularly deleted - having a directory attribute for this feature in addition of per-file-attribute as a default for new files - defining a reclaim-strategy as delete biggest/smallest files first, delete oldest mark-deleted file first etc. - having a keep age attribute to automatically delete mark-deleted files after some time - having a deleted copies attribute to define/limit how many copies of the same deleted file has to be kept - defining a max. fillup percentage for startig to reclaim space. By this the overhead of reclaiming may be moved to a moment of low disk usage (except when there is no space left) and the first disadvantage would be mostly gone. - tool to manually purge/clean some mark-deleted files (specific ones or in a directory/tree) Other improvements in connection with other plugins: - compressing mark-deleted files - having a wipe file attribute for security which wipes the data portion of the deleted file (with zeroes, random, ...) before freeing the blocks (evtl. in combination with the keep age attribute above) As for the implementation the mark-deleted files/directories may be moved to a (hidden) .deleted directory on the fs-root. Dependant on implementation this may eventually loose the original location of the file and could disclose otherwise protected files but would reduce the overhead of the delete oldest mark-deleted file first strategy. Of course having this directory as a kind of metadata listing of the files and keeping them where they are would do the same but needs more coding. So far my thoughts. Comments welcome. -- Konstantin Münning
reiserfstune: block allocator is not defined
Hi! Just tried to add some badblocks like this: reiserfstune -b /tmp/badblocklist /dev/hda5 and I get the error: block allocator is not defined Aborted The same when I try it with -B. As this message means nothing to me, has somebody any idea what the problem might be? The man pages and interestingly a web search gave me nothing about it. Or is reiserfstune not to be used for adding bad blocks to the fs? Kernel 2.6.12, reiserfstools 3.6.19. Thanks! -- Konstantin Münning
Re: mkreiserfs: Meaning of fileysteme-size arg
Hi! [EMAIL PROTECTED] wrote: Hi, what is the EXACT meaning of the filesystem-size command line argument to mkreiserfs? according to the man page: filesystem-size is the size in blocks of the filesystem. If omitted, mkreiserfs will automatically set it. Is it the total size the filesystem occupies, i.e. will the filesystem fit on a partition exactly the size in filesystem-size, or is it some sort of net value, to which I have to add some overhead. In my understanding and experience it is the size you want to occupy with the fs. This would be the size of the partition you want the fs to fit in. If you don't specify it mkreiserfs will query the size itself. Background: I want to create a filesystem on a disk partition that can be backed up to a dvd+r just by dd-ing an image of the filesystem onto the dvd+r-writer, and I'd like to make the filesystem exactly the size the dvd+r can hold. Unfortunately, this exact size cannot be expressed by my hard disk geometry (i.e. i cannot make a partition exactly that size), so the partition has to be a bit bigger than the filesystem. Can I create a reiserfs that is smaller than the partition it is contained in, with the filesystem-size argument,, and will this work reliably? I assume you want to make it like this to be able to mount the DVD later for accessing the backup files directly (no untarring). The idea is nice but maybe there are better ways. For the reliability - if you can umount (or remount ro) the partition before backing it up it would work reliable. Otherwise you will be saving an unclean partition with possible corruptions as journal and fs contents may change independently while you are storing. Why using reiserfs on a read only medium? The only sense I see is if you want to keep the positions of the data on the medium after restoring by dd-ing it back to your partition. Some software copy protections need this. Otherwise you would be wasting media for the journal. Creating a new fs and restoring the files is only a little bit more effort but you get allways a clean fs. If your concern is to store files 2GB which isofs can't, use UDF. The dvd+rw-tools can do this on the fly so doing this is as easy as dd-ing the partition to the DVD. Do you need to have a partition for your fs? Arbitrary sizes are possible if you create an image file and store your fs there. You can mount it using the loopback device. Otherwise if you use a partition bigger than the fs you need to specify fs size on the dd command as otherwise dd will transfer the size of the partition, not the size of the (smaller) fs. I hope I could help. Konstantin
Re: we have got hash function screwed up
Hi! Gabor HALASZ wrote: [EMAIL PROTECTED]:~# touch /home/ftpd/pub/debian/pool/main/x/xorg-x11/.in.xserver-xorg_6.8.2.dfsg.1-6_i386.deb touch: cannot touch `/home/ftpd/pub/debian/pool/main/x/xorg-x11/.in.xserver-xorg_6.8.2.dfsg.1-6_i386.deb': Device or resource busy Errors like these I've seen mostly with corrupted FS. Umount the partition (reboot from some live-CD if it's your root partition) and do an fsck.reiserfs /dev/xxx. I'm quite sure it will find something. Before trying repairs check if your disk is operational (at least badblocks -s /dev/xxx) as otherwise things can get worse. And don't forget to backup as much of your important data as possible - it is very likely but there's no guarantee that it will survive a repair. If fsck don't show anything then someone better informed should help you. Have luck, Konstantin
Re: Strange problems/bugs with reiserfs and reiserfschk
Hi Vitaly! Thank you for the reiserfsck 3.9.20. It in fact had different results on that drive. I had it run in gdb (as I did with 3.6.19 to see what/where the trouble may be) and the result is: (***snip***) vpf-10680: The file [641222 641239] has the wrong block count in the StatData (1528) - corrected to (1520) vpf-10680: The file [641222 641241] has the wrong block count in the StatData (47192) - corrected to (47168) vpf-10680: The file [641222 641242] has the wrong block count in the StatData (16624) - corrected to (16528) are_file_items_correct: All bytes we look for must be first items byte (position 0). Program received signal SIGABRT, Aborted. 0xe410 in __kernel_vsyscall () (***snip***) Hmm... Ugly ;-). Vitaly Fertman wrote: if some file item offsets are corrupted, fsck can work for too long on pass2. or it also can be a bug. I will send you a version of reiserfsprogs that has an optimization fix for former. if it fails email me and provide the metadata please: debugreiserfs -p device | bzip2 -c device.bz2 Do you need the metadata or the full logfile or should I send you something more/else for that? Thanks and have a nice day, Konstantin
Re: Strange problems/bugs with reiserfs and reiserfschk
Hi Everyone. OK, there seems definitely to be some kind of bug in reiserfsck 3.6.19. Or is it a feature? ;-) I tried once again with reiserfsck --rebuild-tree to repair the FS and here it is again. About the end of pass 2 (about 20h after starting) counting stopped at left 32022, 500 /sec but there was heavy acccess of the drive. After about an hour reiserfsck started consuming 100% CPU and is doning some minimal access to the drive (the drive light blinks every second or so, SCSI reports about 40 commands for each of these accesses). What could be causing this? Is the drive too large for reiserfsck? I wouldn't believe that 0,6TB are but it is consuming at least quite a lot of memory: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 15185 root 39 19 76304 51m 688 R 99.9 10.3 4591:30 reiserfsck I have left it like this the last 3 days just to make sure that it's not my lack of patience. But now still... So any advices? How to find out what is causing reiserfsck to hang? Or would I have to build a debug version and check for myself? The drive itself is working, I checked several times, the server is working all the time as well. Here some reiserfs output it it helps somebody to have an idea: (***snip***) block 125371211: The number of items (1) is incorrect, should be (0) - corrected block 125371211: The free space (0) is incorrect, should be (4072) - corrected block 125506454: The number of items (1) is incorrect, should be (0) - corrected block 125506454: The free space (0) is incorrect, should be (4072) - corrected pass0: vpf-10160: block 129485584: item 2: No . entry found in the first item of a directory pass0: vpf-10160: block 129485584: item 4: No . entry found in the first item of a directory pass0: vpf-10160: block 133169584: item 14: No . entry found in the first item of a directory pass0: vpf-10160: block 133169648: item 25: No . entry found in the first item of a directory pass0: vpf-10160: block 133170064: item 10: No . entry found in the first item of a directory pass0: vpf-10160: block 134316400: item 18: No . entry found in the first item of a directory pass0: vpf-10560: block 145031172, item 7: Wrong order of items - change the obj ect_id of the key [2237738 2237741 0x1 DRCT (2)] to 2237740 pass0: vpf-10160: block 145817616: item 1: No . entry found in the first item of a directory left 0, 2623 /sec 914346 directory entries were hashed with r5 hash. r5 hash is selected Flushing..finished Read blocks (but not data blocks) 94209553 Leaves among those 3522616 - corrected leaves 319 - leaves all contents of which could not be saved and de leted 15 pointers in indirect items to wrong area 24 (zeroed) Objectids found 984719 Pass 1 (will try to insert 3522601 leaves): ### Pass 1 ### Looking for allocable blocks .. finished 0%20%40%is_leaf_bad: block 35452246, item 25: The corrupted item fou nd (878203 878203 0x0 SD (0), len 44, location 3056 entry count 65535, fsck need 1, format new) is_leaf_bad: block 35452246, item 26: The corrupted item found (878203 878203 0x 1 DRCT (2), len 1464, location 1592 entry count 65535, fsck need 1, format new) is_leaf_bad: WARNING: The leaf (35452246) is formatted badly. Will be handled on the the pass2. 60%80%100% left 0, 701 /sec Flushing..finished 3522601 leaves read 3489821 inserted 32780 not inserted non-unique pointers in indirect items (zeroed) 656 ### Pass 2 ### Pass 2: 0%20%40%..rewrite_file: 2 items of file [2340286 2340312] moved to [2340 286 16] vpf-10260: The file we are inserting the new item (432679 432760 0xf001 IND (1), len 160, location 3936 entry count 0, fsck need 1, format new) into has no Stat Data, insertion was skipped vpf-10260: The file we are inserting the new item (432679 432828 0x2a001 IND (1) , len 132, location 3964 entry count 0, fsck need 1, format new) into has no Sta tData, insertion was skipped (***snip***) vpf-10260: The file we are inserting the new item (526132 526780 0x1 IND (1), len 8, location 4088 entry count 0, fsck need 1, format new) into has no StatData, insertion was skipped vpf-10260: The file we are inserting the new item (557044 557073 0x1 IND (1), len 4, location 4092 entry count 0, fsck need 3, format new) into has no StatData, insertion was skipped vpf-10260: The file we are inserting the new item (558483 558492 0x20001 IND (1), len 96, location 4000 entry count 0, fsck need 1, format new) into has no StatData, insertion was skipped vpf-10260: The file we are inserting the new item (759159 759160 0x1 IND (1), len 3208, location 888 entry count 0, fsck need 1, format new) into has no StatData, insertion was skipped left 32022, 500 /sec
Re: Strange problems/bugs with reiserfs and reiserfschk
Hi. This processor produces much heat but this is only a question of how you cool it. The system has no troubles with heat and stability. It's a server which is constantly running and except the mentioned problem there are no other troubles. I can compile things for hours on that system so memory and/or heat shouldn't be the problem. When doing disk access the cpu is mostly idle. Working with lots of files on several drives had not produced any problems. There are no recent hardware or software upgrades which coincide with the bug. The only thing which coincide is the FS corruption. The other drives are still fine. michael chang wrote: On 8/7/05, Konstantin Münning [EMAIL PROTECTED] wrote: There seems to be something I would call a bud in ReiserFS at least in kernel 2.6.11.11 which can cause the system/computer to freeze. It is caused by a corruption of the FS but at that point I expected to have Kernel 2.6.11.11, Gentoo-Linux, SMP (HyperThreading P4, 3GHz) snip If memory serves me right, any 3GHz processor will get very hot, very fast. Is it possible that it got hot and started messing up data? Maybe consider downclocking your cpu or using CPUFreq (or similar) and see if there isn't data loss running at e.g. 2.5 or 1.5 GHz. Either that, or don't run it for more than a few hours at a time. I'm pretty sure 2.6.11.11 has CPUFreq in it somewheres. Something to look at in the future. Of course, this is all speculation. I have absolutely no idea.
Strange problems/bugs with reiserfs and reiserfschk
Hi Folks! There seems to be something I would call a bud in ReiserFS at least in kernel 2.6.11.11 which can cause the system/computer to freeze. It is caused by a corruption of the FS but at that point I expected to have some inaccessable files which I already know from FS corruptions but not to hang the system. If someone thinks it's worth investigating, please read further. Yes, I know that working with a corrupt FS is nothing good but my intention was simply to save as much of the files as possible before doing a rebuild-tree just in case it's all gone after that. As I said, my experience with corrupt ReiserFS was good with the knowledge that some files/direcories would be incaccessible. But this time the System was rendered unuseable when accessing certain directories - no more mount/umount or even sync were possible (they simply did not return) so there was no way to shutdown the machine. IMHO this should be considered as a severe bug - refusing to read a corrupt portions of a FS is OK but rendering the system unuseable is bad. I'm wondering what kind of information I can provide so the source of this can be found. Here some but if you want more, please tell me: Kernel 2.6.11.11, Gentoo-Linux, SMP (HyperThreading P4, 3GHz) Here some portions of /var/log/messages which may show what's about: the messages just before the system got unuseable: ** Aug 5 22:17:22 master ReiserFS: sdc1: warning: vs-5150: search_by_key: invalid format found in block 27594920. Fsck? Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node level 57345 does not match to the expected one 1 (snip) Aug 5 22:17:22 master ReiserFS: sdc1: warning: vs-5150: search_by_key: invalid format found in block 27594920. Fsck? Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node level 57345 does not match to the expected one 1 Aug 5 22:17:22 master ReiserFS: sdc1: warning: vs-5150: search_by_key: invalid format found in block 27594920. Fsck? Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node level 57345 does not match tond in block 27594920. Fsck? Aug 5 22:17:22 master ReiserFS: warmatch to the expected one 1 Aug 5 22:17:22 master unparseable log message: nd in block 27594920. Fsck? Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node levematch to the expected ond in block 27594920. Fsck? Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: nmatch to the expected one 1 Aug 5 22:17:22 master unparseable log message: nd in block 27594920. Fsck? Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node lmatch to the expected ond in block 27594920. Fsck? Aug 5 22:17:22 master ReiserFS: warning: is_tree_node: node match to the expected ond in block 27594920. Fsck? (snip) Aug 5 22:17:26 master ReiserFS: warning:nd in block 27608085. Fsck? Aug 5 22:17:26 master ReiserFS: warninmatch to the expected onend in block 2760 nd in block 27608085. Fsck? Aug 5 22:17:26 master ReiserFS: warning: is_tree_node: nodematch to the expected nd in block 27608085. Fsck? (snip) Aug 5 22:18:03 master ReiserFS: sdc1: warning: vs-5150: search_by_key: invalid format found in block 122418791. Fsck? Aug 5 22:18:03 master ReiserFS: warning: is_tree_node: node level 18499 does no t match to the expected one 1 Aug 5 22:18:03 master ReiserFS: sdc1: warning: vs-5150: search_by_key: invalid format found in block 122418791. Fsck? Aug 5 22:18:03 master init_special_inode: bogus i_mode (17) Aug 5 22:18:03 master ReiserFS: warning: is_tree_node: node level 65471 does no t match to the expected one 1 Aug 5 22:18:03 master ReiserFS: sdc1: warning: vs-5150: search_by_key: invalid format found in block 82406293. Fsck? ** The interesting point is that the messages are getting weird at some point - see portion after the first (snip). As if something is overwriting an internal buffer or something. Maye caused by high frequency of messages or some race condition between processors? I have no idea if this is an indication of the suspected bug but that seems likely to me. The last portion are the last messages just before the next boot of the computer. Just if you ask - CPU/Memory of that server are fine as long as Memtest86(+) can tell. So, what's next? Now to the second part. After giving up to save more data (well, I saved the important 30% of these 400GB) I started a reiserfsck --rebuild-tree. It worked quite good until about the end. There it seems to be frozen and consumes 100% CPU. Here some data: reiserfsprogs-3.6.19, messages of reiserfsck: ** .pass1: block 145817616, item 1, entry 0: The entry .. of the [259961 259992 0x2 DIR (3)] is hashed with not set whereas proper hash is r5 - deleted 100% left 0, 212 /sec Flushing..finished 268526 leaves read 203192 inserted - pointers in indirect items pointing to metadata 1890 (zeroed) 65334 not inserted non-unique