Re: vpf-10680, minor corruptions

2003-06-27 Thread Oleg Drokin
Hello!

On Fri, Jun 27, 2003 at 04:38:00PM +0400, Oleg Drokin wrote:

 I was looking in the wrong direction, when I produced that patch,
 so it will produce zero output.
 I hope to come up with ultimate fix soon enough. ;)

Well, there is a patch below that does *not* work for me ;)
But it should work.
I have traced the new problem to a cross compiler that compiles
code in a different way than native compiler for whatever reason
(demo is attached as test.c program, it should print result is 1
in case it is compiled correctly and stuff about unknown
uniqueness if it is miscompiled. In fact may be this is just correct compiler 
behaviour.)
I now think that when I compile a kernel with native compiler, it should work
with below patch. But I can verify that only tomorrow it seems.
You might try that patch as well to see if it helps you before I try it ;)
The patch is obviously correct one. (except that it does not work
with my cross compiler and kernel does work without patch which is really-really 
strange).

= fs/reiserfs/bitmap.c 1.26 vs edited =
--- 1.26/fs/reiserfs/bitmap.c   Sun May 18 01:09:36 2003
+++ edited/fs/reiserfs/bitmap.c Fri Jun 27 16:58:44 2003
@@ -43,7 +43,7 @@
 test_bit(_ALLOC_ ## optname , SB_ALLOC_OPTS(s))
 
 static inline void get_bit_address (struct super_block * s,
-   unsigned long block, int * bmap_nr, int * offset)
+   b_blocknr_t block, int * bmap_nr, int * offset)
 {
 /* It is in the bitmap block number equal to the block
  * number divided by the number of bits in a block. */
@@ -54,7 +54,7 @@
 }
 
 #ifdef CONFIG_REISERFS_CHECK
-int is_reusable (struct super_block * s, unsigned long block, int bit_value)
+int is_reusable (struct super_block * s, b_blocknr_t block, int bit_value)
 {
 int i, j;
 
@@ -107,7 +107,7 @@
 static inline  int is_block_in_journal (struct super_block * s, int bmap, int
 off, int *next)
 {
-unsigned long tmp;
+b_blocknr_t tmp;
 
 if (reiserfs_in_journal (s, bmap, off, 1, tmp)) {
if (tmp) {  /* hint supplied */
@@ -235,7 +235,7 @@
 /* Tries to find contiguous zero bit window (given size) in given region of
  * bitmap and place new blocks there. Returns number of allocated blocks. */
 static int scan_bitmap (struct reiserfs_transaction_handle *th,
-   unsigned long *start, unsigned long finish,
+   b_blocknr_t *start, b_blocknr_t finish,
int min, int max, int unfm, unsigned long file_block)
 {
 int nr_allocated=0;
@@ -281,7 +281,7 @@
 }
 
 static void _reiserfs_free_block (struct reiserfs_transaction_handle *th,
- unsigned long block)
+ b_blocknr_t block)
 {
 struct super_block * s = th-t_super;
 struct reiserfs_super_block * rs;
@@ -327,7 +327,7 @@
 }
 
 void reiserfs_free_block (struct reiserfs_transaction_handle *th, 
-  unsigned long block)
+  b_blocknr_t block)
 {
 struct super_block * s = th-t_super;
 
@@ -340,7 +340,7 @@
 
 /* preallocated blocks don't need to be run through journal_mark_freed */
 void reiserfs_free_prealloc_block (struct reiserfs_transaction_handle *th, 
-  unsigned long block) {
+  b_blocknr_t block) {
 RFALSE(!th-t_super, vs-4060: trying to free block on nonexistent device);
 RFALSE(is_reusable (th-t_super, block, 1) == 0, vs-4070: can not free such 
block);
 _reiserfs_free_block(th, block) ;
@@ -589,15 +589,15 @@
 
 static inline int old_hashed_relocation (reiserfs_blocknr_hint_t * hint)
 {
-unsigned long border;
-unsigned long hash_in;
+b_blocknr_t border;
+u32 long hash_in;
 
 if (hint-formatted_node || hint-inode == NULL) {
return 0;
   }
 
 hash_in = le32_to_cpu((INODE_PKEY(hint-inode))-k_dir_id);
-border = hint-beg + (unsigned long) keyed_hash(((char *) (hash_in)), 4) % 
(hint-end - hint-beg - 1);
+border = hint-beg + (u32) keyed_hash(((char *) (hash_in)), 4) % (hint-end - 
hint-beg - 1);
 if (border  hint-search_start)
hint-search_start = border;
 
@@ -606,7 +606,7 @@
   
 static inline int old_way (reiserfs_blocknr_hint_t * hint)
 {
-unsigned long border;
+b_blocknr_t border;
 
 if (hint-formatted_node || hint-inode == NULL) {
return 0;
@@ -622,7 +622,7 @@
 static inline void hundredth_slices (reiserfs_blocknr_hint_t * hint)
 {
 struct key * key = hint-key;
-unsigned long slice_start;
+b_blocknr_t slice_start;
 
 slice_start = (keyed_hash((char*)(key-k_dir_id),4) % 100) * (hint-end / 100);
 if ( slice_start  hint-search_start || slice_start + (hint-end / 100) = 
hint-search_start) {
@@ -910,7 +910,7 @@
 int reiserfs_can_fit_pages ( struct super_block *sb /* superblock of filesystem
   to estimate space */ )
 

Re: vpf-10680, minor corruptions

2003-06-27 Thread Chris Mason
On Fri, 2003-06-27 at 12:13, Oleg Drokin wrote:
 Hello!
 
 On Fri, Jun 27, 2003 at 04:38:00PM +0400, Oleg Drokin wrote:
 
  I was looking in the wrong direction, when I produced that patch,
  so it will produce zero output.
  I hope to come up with ultimate fix soon enough. ;)
 
 Well, there is a patch below that does *not* work for me ;)
 But it should work.
 I have traced the new problem to a cross compiler that compiles
 code in a different way than native compiler for whatever reason
 (demo is attached as test.c program, it should print result is 1
 in case it is compiled correctly and stuff about unknown
 uniqueness if it is miscompiled. In fact may be this is just correct compiler 
 behaviour.)
 I now think that when I compile a kernel with native compiler, it should work
 with below patch. But I can verify that only tomorrow it seems.
 You might try that patch as well to see if it helps you before I try it ;)
 The patch is obviously correct one. (except that it does not work
 with my cross compiler and kernel does work without patch which is really-really 
 strange).
 

Most of these changes are in 2.4.21, which I've been using on an AMD64
bit box for a while without any problems.  The bug should be somewhere
else, it looks to me like these spots aren't trying to send an unsigned
long to disk.

-chris




Re: vpf-10680, minor corruptions

2003-06-27 Thread Christian Kujau
Oleg Drokin schrieb:
I have traced the new problem to a cross compiler that compiles
code in a different way than native compiler for whatever reason
(demo is attached as test.c program, it should print result is 1
yes, that what it prints, no warnings were shown.

You might try that patch as well to see if it helps you before I try it ;)
yes, compiling with _this_ patch but _not_ with the last patch you sent 
(file.c) is under way again...

Thank you,
Christian.


Re: vpf-10680, minor corruptions

2003-06-27 Thread Christian Kujau
Oleg Drokin schrieb:
I have traced the new problem to a cross compiler that compiles
code in a different way than native compiler for whatever reason
(demo is attached as test.c program, it should print result is 1
yes, that what it prints, no warnings were shown.

You might try that patch as well to see if it helps you before I try it ;)
yes, compiling with _this_ patch but _not_ with the last patch you sent 
(file.c) is under way again...

Thank you,
Christian.


Re: vpf-10680, minor corruptions

2003-06-27 Thread Oleg Drokin
Hello!

On Fri, Jun 27, 2003 at 12:23:07PM -0400, Chris Mason wrote:

 Most of these changes are in 2.4.21, which I've been using on an AMD64

Not the reiserfs_file_write() ones.

 bit box for a while without any problems.  The bug should be somewhere
 else, it looks to me like these spots aren't trying to send an unsigned
 long to disk.

the reiserfs_file_write() code
have an array of b_blocknr_t elements.
It then submits this array to reiserfs_paste_into_item/reiserfs_insert_item,
but b_blocknr_t is unsigned long (read - 64 bit on alpha - oops).
Funny thing is when I declare b_blocknr_t as u32, kernel basically falls apart
if cross compiled. E.g. key comparison does not work and
all kind of weird things start to happen.

In short - if you want to make sure the bug is there - compile 2.5.70+ code
on any 64 bit platform, write any file bigger than 2 blocks,
unmount and remount the fs and see what's in the file.

Bye,
Oleg


Re: vpf-10680, minor corruptions

2003-06-25 Thread Christian Kujau
Oleg Drokin schrieb:
Try to compile with CONFIG_REISERFS_CHECK=y the kernel that known-bad for you.
(e.g. 2.5.72/2.5.73)
yes, 2.5.72 with CONFIG_REISERFS_CHECK=y is compiling now.

over night the alpha finished compiling 2.5.65 and 2.5.69. i had to 
compile reiserfs statically, inserting modules gave these Invalid 
module format errors.

under both (2.5.65+2.5.69) i was able to mkreiserfs sde2. mounting the 
fs went ok, but copying data (cp -a /lib /mnt/reiserfs) brought several 
kernel-errors (see https://ephigenie.kicks-ass.net/browse/reiserfs/).

but: diff -r showed _no_ differences betweeen the directories, a 
following reiserfsck brought no vpf-10680 anymore!

so i'd say the problem occurs somewhere between 2.5.69 and 2.5.70.

thanks,
Christian.


Re: vpf-10680, minor corruptions

2003-06-24 Thread Oleg Drokin
Hello!

On Mon, Jun 23, 2003 at 03:38:20PM +0200, Christian Kujau wrote:

 as stated before, the corruptions occur only on this very alpha machine, 

Well, I still cannot build the kernel myself and still working on it.
(having make: *** [vmlinux] Error 139 and zero length vmlinux)

BTW, I realised that I have not looked into your kernel config for that box,
can you send it to me please?

 bread: Cannot read the block (523914): (Input/output error).

Hm, but still it means kernel returned some error for read request.

 hah! i was not aware that the disk might have an hw problem, not a 
 single error ever showed up in my logs. this was weird. so i 
 re-partitioned the disk with a 10MB sde (to circumvent the bread error) 
 on the beginning and a 2 GB sde2. now reiserfsck/cp/diff are all working 
 fine under 2.4.21, but 2.5.72 is still erroneous.

Sigh.

 
 btw: i am still using reiserfsprogs 3.6.8 now (since debian/testing has 
 3.6.6) and i have compiled these utils under a 2.5.72 kernel. is it safe 
 to use them under 2.4 ?

I see that you have used 2.5.70 and earlier kernels on alpha too.
Do you have any idea of when stuff broke for you?

Bye,
Oleg


Re: vpf-10680, minor corruptions

2003-06-24 Thread Christian Kujau
Christian Kujau schrieb:
of course, the best thing i can do is the el-cheapo-hacking approach: 
compiling 2.5.60...up to 2.5.72 and see *when* it breaks. hm, compiling 
a 2.5 kernel takes 180min on this machine. but anyway, i'll start with 
2.5.60 now, see what it gives.
no, i started with 2.5.66 but the kernel did not compile. 2.5.65 did 
compile (don't ask how long) and has already booted. but trying to 
mount the newly created reiserfs gives:

module reiserfs: Relocation overflow vs section 9

in the log. the reiserfs module was not loaded. modprobe reiserfs gives:

lila:~# modprobe reiserfs
FATAL: Error inserting reiserfs 
(/lib/modules/2.5.65/kernel/fs/reiserfs/reiserfs.ko): Invalid module format
lila:~# uname -a
Linux lila 2.5.65 #4 Wed Jun 25 00:48:46 CEST 2003 alpha GNU/Linux

i compiled the module with CONFIG_REISERFS_CHECK=y.

shall i go on with 2.5.64 or better 2.5.67 ?

good night,
Christian.


Re: vpf-10680, minor corruptions

2003-06-24 Thread Oleg Drokin
Hello!

On Wed, Jun 25, 2003 at 02:42:24AM +0200, Christian Kujau wrote:
 (/lib/modules/2.5.65/kernel/fs/reiserfs/reiserfs.ko): Invalid module format
 lila:~# uname -a
 Linux lila 2.5.65 #4 Wed Jun 25 00:48:46 CEST 2003 alpha GNU/Linux
 i compiled the module with CONFIG_REISERFS_CHECK=y.
 shall i go on with 2.5.64 or better 2.5.67 ?

Try to compile with CONFIG_REISERFS_CHECK=y the kernel that known-bad for you.
(e.g. 2.5.72/2.5.73)

Bye,
Oleg


Re: vpf-10680, minor corruptions

2003-06-18 Thread Christian Kujau
Oleg Drokin schrieb:
Hm, interesting. Do you had crashes/unexpected shutdowns before corruptions appears
or are they appear without any reason at all?
i had this issue once before -- did a check and noticed vpf-10680/some 
corruptions. but these must have been from an crash.
but now, i think as i rebooted the machine yesterday (because i upgraded 
to kernel 2.5.72) the journal was checked (replayed?) anyway at boot:

found reiserfs format 3.6 with standard journal
Reiserfs journal params: device sde2, size 8192, journal first block 18, 
max trans len 1024, max batch 900, max commit age 30, max trans age 30
reiserfs: checking transaction log (sde2) for (sde2)
Using r5 hash to sort names

(from dmesg, booting process)

and i thought the fs is O.K. at least after boot, because ReiserFS 
cares about consistency for itsself. if not, the corruptions are likely 
from the unclean shutdowns. but that would mean, that i still have to 
manually reiserfsck from time to time.

btw, is there a switch like Maximum mount counft before doing the next 
fsck while booting?

Well, I guess it's time to clear the dust off our alpha and do some testing.
hehe, should it be architecture related?

Thank you,
Christian.