[2.6.19-rc2-mm1] error: too few arguments to function ‘crypto_alloc_hash’
Hello, The latest -mm introduced a new error: CC fs/reiser4/plugin/crypto/digest.o fs/reiser4/plugin/crypto/digest.c: In function ‘alloc_sha256’: fs/reiser4/plugin/crypto/digest.c:17: error: too few arguments to function ‘crypto_alloc_hash’ make[2]: *** [fs/reiser4/plugin/crypto/digest.o] Error 1 make[1]: *** [fs/reiser4] Error 2 make: *** [fs] Error 2 Andrew Wade
[patch] Fix use after free in jrelse_tail
Hello Alexander, [nikita-1936] assertion failed: reiser4_no_counters_are_held() turned out to be a bug in the debugging code. I've applied the patch below and haven't had a recurrence. Cheers, Andrew Wade signed-off-by [EMAIL PROTECTED] diff -rupN a/fs/reiser4/jnode.c b/fs/reiser4/jnode.c --- a/fs/reiser4/jnode.c2006-09-01 16:44:51.0 -0400 +++ b/fs/reiser4/jnode.c2006-09-01 16:58:06.0 -0400 @@ -999,10 +999,10 @@ void jrelse_tail(jnode * node /* jnode t { assert(nikita-489, atomic_read(node-d_count) 0); atomic_dec(node-d_count); - /* release reference acquired in jload_gfp() or jinit_new() */ - jput(node); if (jnode_is_unformatted(node) || jnode_is_znode(node)) LOCK_CNT_DEC(d_refs); + /* release reference acquired in jload_gfp() or jinit_new() */ + jput(node); } /* drop reference to node data. When last reference is dropped, data are
Re: assertion failed: JF_ISSET(jprivate(page), JNODE_DIRTY)
On Wednesday 30 August 2006 06:26, Alexander Zarochentsev wrote: On 30 August 2006 01:38, Andrew James Wade wrote: I now have a stack trace for this assertion: there is a race between znode_make_dirty and flushing dirty node to disk. I guess (but not sure by 100%) it has no bad effect so the assertion is wrong. Okay, I'll change that to a WARN_ON in my tree and see what falls out. Thanks, Andrew Wade
[patch] Re: assertion failed: can_hit_entd(ctx, s)
Hello Alexander, In addition to your patch, I've also applied the patch below. With these two patches the fs is much more stable for me. However, something is holding a d_ref across the calls to reiser4_writepage. It's not clear to me that this is allowed so my patch may not be a full fix. Andrew Wade signed-off-by: [EMAIL PROTECTED] diff -rupN a/fs/reiser4/plugin/item/extent_file_ops.c b/fs/reiser4/plugin/item/extent_file_ops.c --- a/fs/reiser4/plugin/item/extent_file_ops.c 2006-08-28 11:30:33.0 -0400 +++ b/fs/reiser4/plugin/item/extent_file_ops.c 2006-08-29 13:06:20.0 -0400 @@ -1320,20 +1320,22 @@ static int extent_readpage_filler(void * TWIG_LEVEL, CBK_UNIQUE, NULL); if (result != CBK_COORD_FOUND) { reiser4_unset_hint(hint); - return result; + goto out; } ext_coord-valid = 0; } if (zload(ext_coord-coord.node)) { reiser4_unset_hint(hint); - return RETERR(-EIO); + result = RETERR(-EIO); + goto out; } if (!item_is_extent(ext_coord-coord)) { /* tail conversion is running in parallel */ zrelse(ext_coord-coord.node); reiser4_unset_hint(hint); - return RETERR(-EIO); + result = RETERR(-EIO); + goto out; } if (ext_coord-valid == 0) @@ -1358,6 +1360,10 @@ static int extent_readpage_filler(void * } else reiser4_unset_hint(hint); zrelse(ext_coord-coord.node); + +out: + /* Calls to this function may be intermingled with VM writeback. */ + reiser4_txn_restart_current(); return result; }
Re: assertion failed: JF_ISSET(jprivate(page), JNODE_DIRTY)
I now have a stack trace for this assertion: reiser4 panicked cowardly: reiser4[tar(5412)]: reiser4_set_page_dirty_internal (fs/reiser4/page_cache.c:475)[]: assertion failed: JF_ISSET(jprivate(page), JNODE_DIRTY) [c0103870] dump_trace+0x64/0x1ad [c01039cb] show_trace_log_lvl+0x12/0x25 [c0103cc1] show_trace+0xd/0x10 [c0103cdb] dump_stack+0x17/0x19 [c01a0caf] reiser4_do_panic+0x4e/0x84 [c01e66e9] reiser4_set_page_dirty_internal+0xe5/0xee [c01cb2a4] znode_make_dirty+0x271/0x452 [c022108f] cut_node40+0x191/0x1a6 [c0221410] shift_node40+0x36c/0x91d [c01b11ba] carry_shift_data+0xaa/0x139 [c01b2e1b] carry_insert_flow+0x1de/0x837 [c01b008f] reiser4_carry+0x185/0x49a [c01b8d77] reiser4_insert_flow+0x16b/0x17e [c022deec] reiser4_write_tail+0x5cd/0x685 [c020445f] batch_write_unix_file+0x26e/0x467 [c0133347] generic_file_buffered_write+0xd2/0x1fb [c0135035] __generic_file_aio_write_nolock+0x3a8/0x3e5 [c01350ca] generic_file_aio_write+0x58/0xab [c014eda6] do_sync_write+0xb4/0xf2 [c014f38e] vfs_write+0x8a/0x136 [c014faca] sys_write+0x3b/0x60 [c01028cd] sysenter_past_esp+0x56/0x8d DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d Leftover inexact backtrace: [c01039cb] show_trace_log_lvl+0x12/0x25 [c0103cc1] show_trace+0xd/0x10 [c0103cdb] dump_stack+0x17/0x19 [c01a0caf] reiser4_do_panic+0x4e/0x84 [c01e66e9] reiser4_set_page_dirty_internal+0xe5/0xee [c01cb2a4] znode_make_dirty+0x271/0x452 [c022108f] cut_node40+0x191/0x1a6 [c0221410] shift_node40+0x36c/0x91d [c01b11ba] carry_shift_data+0xaa/0x139 [c01b2e1b] carry_insert_flow+0x1de/0x837 [c01b008f] reiser4_carry+0x185/0x49a [c01b8d77] reiser4_insert_flow+0x16b/0x17e [c022deec] reiser4_write_tail+0x5cd/0x685 [c020445f] batch_write_unix_file+0x26e/0x467 [c0133347] generic_file_buffered_write+0xd2/0x1fb [c0135035] __generic_file_aio_write_nolock+0x3a8/0x3e5 [c01350ca] generic_file_aio_write+0x58/0xab [c014eda6] do_sync_write+0xb4/0xf2 [c014f38e] vfs_write+0x8a/0x136 [c014faca] sys_write+0x3b/0x60 [c01028cd] sysenter_past_esp+0x56/0x8d === : jnode: 0, tree: 0 (r:0,w:0), dk: 0 (r:0,w:0) jload: 0, txnh: 0, atom: 0, stack: 0, txnmgr: 0, ktxnmgrd: 0, fq: 0 inode: 0, cbk_cache: 0 (r:0,w0), eflush: 0, zlock: 0, spin: 0, long: 3 inode_sem: (r:0,w:1) d: 3, x: 6, t: 0 Kernel panic - not syncing: reiser4[tar(5412)]: reiser4_set_page_dirty_internal (fs/reiser4/page_cache.c:475)[]: assertion failed: JF_ISSET(jprivate(page), JNODE_DIRTY) Andrew Wade
Re: assertion failed: can_hit_entd(ctx, s)
btw, is [EMAIL PROTECTED] your mail address (it is from Reply-To:) ? Reply-to fixed; thanks. The above address is an ephemeral address I've subscribed to the mailing list and could go away at any time. can you please try the following patch: Will do. Andrew Wade
Re: Reiser4 stress test.
On Tuesday 22 August 2006 01:23, Hans Reiser wrote: Thanks Andrew, please be patient and persistent with us at this time, as one programmer is on vacation, and the other is only able to work a few hours a day due to an illness. No problem. I'll post what I find to the list; the posts will still be there when you have the time to devote to solving bugs. The delay will do me no harm whatsoever and I may even get to the bottom of one or two bugs in the meantime. (I happen to have time to spare at the moment). Andrew Wade
Re: Reiser4 stress test.
On Tuesday 22 August 2006 01:23, Hans Reiser wrote: Thanks Andrew, please be patient and persistent with us at this time, as one programmer is on vacation, and the other is only able to work a few hours a day due to an illness. No problem. I'll post what I find to the list; the posts will still be there when you have time to devote to chasing bugs. They're not urgent problems for me; I just happen to have the time and interest to devote myself to solving them right now, and it appears I'll be able to muddle through the code okay. Andrew
Reiser4 stress test.
Hello, I've been having problems with Reiser 4 panicking for a few months, and I've recently had time to investigate the matter. I've created a program that can crash my system in a few minutes. It's based on kmail's disk activity and consists of small, separated writes to a file that is also mmapped. === scatteredwrites === #!/usr/bin/python import os import mmap import optparse parser = optparse.OptionParser(description= Creates a file in $CWD and performs a pattern of reads and writes to it in an attempt to trigger fs bugs. The file is broken up into regions: for each region the entire region is read, then some portion of it is written to. \nDistilled from kmail workload.) parser.add_option(--region-size, dest=regionsize, default=65536, type=int, help=Set region size to BYTES, metavar=BYTES) parser.add_option(--region-count, dest=regioncount, default=2048, type=int, help=Set number of regions to COUNT, metavar=COUNT) parser.add_option(--write-offset, dest=writeoffset, default=0, type=int, help=Offset write by BYTES in each region, metavar=BYTES) parser.add_option(--write-size, dest=writesize, default=256, type=int, help=Size of write in each region., metavar=BYTES) options, args = parser.parse_args() f = open(scatteredwrites.%d.tmp % (os.getpid()), w+b) try: writestr = A * options.regionsize for i in xrange(options.regioncount): f.write(writestr) f.close() f = open(scatteredwrites.%d.tmp % (os.getpid()), r+b) writestr = B * options.writesize dummy = mmap.mmap(f.fileno(), options.regionsize * options.regioncount, mmap.MAP_SHARED) while True: for i in xrange(options.regioncount): f.seek(i * options.regionsize, 0) f.read(options.regionsize) f.seek(- options.regionsize + options.writeoffset,1) f.write(writestr) except KeyboardInterrupt: os.unlink(scatteredwrites.%d.tmp % (os.getpid())) == Without fs load this stress test rarely causes problems. But with five instances running in parallel with five instances of a large grep (or patch, or tar), my computer crashes on a timescale of 10 minutes. I've also added a few patches to my kernel to help me debug the problems I've been having: diff -rupN a/fs/reiser4/page_cache.c b/fs/reiser4/page_cache.c --- a/fs/reiser4/page_cache.c 2006-08-19 19:45:57.0 -0400 +++ b/fs/reiser4/page_cache.c 2006-08-19 20:23:43.0 -0400 @@ -489,12 +489,9 @@ static int can_hit_entd(reiser4_context return 1; if (ctx-super != s) return 1; - if (get_super_private(s)-entd.tsk == current) - return 0; - if (!lock_stack_isclean(ctx-stack)) - return 0; - if (ctx-trans-atom != NULL) - return 0; + assert(ajw-1, get_super_private(s)-entd.tsk != current); + assert(ajw-2, lock_stack_isclean(ctx-stack)); + assert(ajw-3, ctx-trans-atom == NULL); return 1; } diff -rupN 2.6.18-rc4-mm1/fs/reiser4/debug.c linux/fs/reiser4/debug.c --- 2.6.18-rc4-mm1/fs/reiser4/debug.c 2006-08-18 19:21:13.0 -0400 +++ linux/fs/reiser4/debug.c2006-08-18 19:24:35.0 -0400 @@ -56,6 +56,9 @@ static char panic_buf[REISER4_PANIC_MSG_ */ static DEFINE_SPINLOCK(panic_guard); +static void print_lock_counters(const char *prefix, +const reiser4_lock_counters_info * info); + /* Your best friend. Call it on each occasion. This is called by fs/reiser4/debug.h:reiser4_panic(). */ void reiser4_do_panic(const char *format /* format string */ , ... /* rest */ ) @@ -74,6 +77,8 @@ void reiser4_do_panic(const char *format vsnprintf(panic_buf, sizeof(panic_buf), format, args); va_end(args); printk(KERN_EMERG reiser4 panicked cowardly: %s, panic_buf); + dump_stack(); + print_lock_counters(,reiser4_lock_counters()); spin_unlock(panic_guard); /* I've also added this bugfix by Alexander Zarochentsev [EMAIL PROTECTED]: Index: linux-2.6-git/fs/reiser4/as_ops.c === --- linux-2.6-git.orig/fs/reiser4/as_ops.c +++ linux-2.6-git/fs/reiser4/as_ops.c @@ -350,6 +350,11 @@ int reiser4_releasepage(struct page *pag if (PageDirty(page)) return 0; + /* extra page reference is used by reiser4 to protect +* jnode-page link from this -releasepage(). */ + if (page_count(page) 3) + return 0; + /* releasable() needs jnode lock, because it looks at the jnode fields * and we need jload_lock here to avoid races with jload(). */ spin_lock_jnode(node); Andrew Wade
assertion failed: can_hit_entd(ctx, s)
This is the most common panic I've been getting. It looks like this: (2.6.18-rc4-mm1) reiser4 panicked cowardly: reiser4[scatteredwrites(4506)]: reiser4_writepage (fs/reiser4/page_cache.c:522)[]: assertion failed: can_hit_entd(ctx, s) Kernel panic - not syncing: reiser4[scatteredwrites(4506)]: reiser4_writepage (fs/reiser4/page_cache.c:522)[]: assertion failed: can_hit_entd(ctx, s) With the extra patches it looks like this: (2.6.18-rc4-mm2) reiser4 panicked cowardly: reiser4[grep(4918)]: can_hit_entd (fs/reiser4/page_cache.c:494)[ajw-3]: assertion failed: ctx-trans-atom == NULL [c0103870] dump_trace+0x64/0x1ad [c01039cb] show_trace_log_lvl+0x12/0x25 [c0103cc1] show_trace+0xd/0x10 [c0103cdb] dump_stack+0x17/0x19 [c01a0ccf] reiser4_do_panic+0x4e/0x7b [c01e67b1] reiser4_writepage+0xab/0x1a8 [c013b973] shrink_inactive_list+0x37d/0x6f0 [c013bd94] shrink_zone+0xae/0xcc [c013c265] try_to_free_pages+0x139/0x20d [c0136f12] __alloc_pages+0x189/0x27d [c014c2ce] cache_alloc_refill+0x2d2/0x5a0 [c014bfc7] kmem_cache_alloc+0x70/0xa5 [c01eb68c] reiser4_alloc_inode+0x51/0xfa [c0163adc] alloc_inode+0x14/0x122 [c0164ad5] iget5_locked+0x3f/0x132 [c01f4091] reiser4_iget+0x8b/0x361 [c01fadd8] reiser4_lookup_common+0xef/0x151 [c015aef7] do_lookup+0xa0/0x13d [c015b72f] __link_path_walk+0x79b/0xbd4 [c015bbb6] link_path_walk+0x4e/0xc6 [c015c0e3] do_path_lookup+0x203/0x21d [c015c544] __path_lookup_intent_open+0x44/0x76 [c015c5d2] path_lookup_open+0x10/0x12 [c015c7c7] open_namei+0x61/0x570 [c014e72d] do_filp_open+0x1f/0x35 [c014e83e] do_sys_open+0x3f/0xba [c014e8e5] sys_open+0x16/0x18 [c01028cd] sysenter_past_esp+0x56/0x8d DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d Leftover inexact backtrace: [c01039cb] show_trace_log_lvl+0x12/0x25 [c0103cc1] show_trace+0xd/0x10 [c0103cdb] dump_stack+0x17/0x19 [c01a0ccf] reiser4_do_panic+0x4e/0x7b [c01e67b1] reiser4_writepage+0xab/0x1a8 [c013b973] shrink_inactive_list+0x37d/0x6f0 [c013bd94] shrink_zone+0xae/0xcc [c013c265] try_to_free_pages+0x139/0x20d [c0136f12] __alloc_pages+0x189/0x27d [c014c2ce] cache_alloc_refill+0x2d2/0x5a0 [c014bfc7] kmem_cache_alloc+0x70/0xa5 [c01eb68c] reiser4_alloc_inode+0x51/0xfa [c0163adc] alloc_inode+0x14/0x122 [c0164ad5] iget5_locked+0x3f/0x132 [c01f4091] reiser4_iget+0x8b/0x361 [c01fadd8] reiser4_lookup_common+0xef/0x151 [c015aef7] do_lookup+0xa0/0x13d [c015b72f] __link_path_walk+0x79b/0xbd4 [c015bbb6] link_path_walk+0x4e/0xc6 [c015c0e3] do_path_lookup+0x203/0x21d [c015c544] __path_lookup_intent_open+0x44/0x76 [c015c5d2] path_lookup_open+0x10/0x12 [c015c7c7] open_namei+0x61/0x570 [c014e72d] do_filp_open+0x1f/0x35 [c014e83e] do_sys_open+0x3f/0xba [c014e8e5] sys_open+0x16/0x18 [c01028cd] sysenter_past_esp+0x56/0x8d === : jnode: 0, tree: 0 (r:0,w:0), dk: 0 (r:0,w:0) jload: 0, txnh: 0, atom: 0, stack: 0, txnmgr: 0, ktxnmgrd: 0, fq: 0 inode: 0, cbk_cache: 0 (r:0,w0), eflush: 0, zlock: 0, spin: 0, long: 0 inode_sem: (r:0,w:0) d: 0, x: 0, t: 0 Kernel panic - not syncing: reiser4[grep(4918)]: can_hit_entd (fs/reiser4/page_cache.c:494)[ajw-3]: assertion failed: ctx-trans-atom == NULL reiser4 panicked cowardly: reiser4[scatteredwrites(4245)]: can_hit_entd (fs/reiser4/page_cache.c:494)[ajw-3]: assertion failed: ctx-trans-atom == NULL [c0103870] dump_trace+0x64/0x1ad [c01039cb] show_trace_log_lvl+0x12/0x25 [c0103cc1] show_trace+0xd/0x10 [c0103cdb] dump_stack+0x17/0x19 [c01a0ccf] reiser4_do_panic+0x4e/0x7b [c01e67b1] reiser4_writepage+0xab/0x1a8 [c013b973] shrink_inactive_list+0x37d/0x6f0 [c013bd94] shrink_zone+0xae/0xcc [c013c265] try_to_free_pages+0x139/0x20d [c0136f12] __alloc_pages+0x189/0x27d [c01388a7] __do_page_cache_readahead+0xcc/0x1d2 [c0138f07] blockable_page_cache_readahead+0x51/0xd9 [c0139010] make_ahead_window+0x81/0xa4 [c013918a] page_cache_readahead+0x157/0x176 [c023aa82] reiser4_read_extent+0x374/0x6ab [c020511f] read_unix_file+0x5c7/0x762 [c014f1e2] vfs_read+0x88/0x134 [c014fa4e] sys_read+0x3b/0x60 [c01028cd] sysenter_past_esp+0x56/0x8d DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d Leftover inexact backtrace: [c01039cb] show_trace_log_lvl+0x12/0x25 [c0103cc1] show_trace+0xd/0x10 [c0103cdb] dump_stack+0x17/0x19 [c01a0ccf] reiser4_do_panic+0x4e/0x7b [c01e67b1] reiser4_writepage+0xab/0x1a8 [c013b973] shrink_inactive_list+0x37d/0x6f0 [c013bd94] shrink_zone+0xae/0xcc [c013c265] try_to_free_pages+0x139/0x20d [c0136f12] __alloc_pages+0x189/0x27d [c01388a7] __do_page_cache_readahead+0xcc/0x1d2 [c0138f07] blockable_page_cache_readahead+0x51/0xd9 [c0139010] make_ahead_window+0x81/0xa4 [c013918a] page_cache_readahead+0x157/0x176 [c023aa82] reiser4_read_extent+0x374/0x6ab [c020511f] read_unix_file+0x5c7/0x762 [c014f1e2] vfs_read+0x88/0x134 [c014fa4e] sys_read+0x3b/0x60 [c01028cd] sysenter_past_esp+0x56/0x8d === : jnode: 0, tree: 0 (r:0,w:0), dk: 0 (r:0,w:0) jload: 0, txnh: 0, atom: 0, stack: 0,
assertion failed: JF_ISSET(jprivate(page), JNODE_DIRTY)
This one hasn't recurred, so I don't have a stack trace. I haven't looked into it. (2.6.18-rc4-mm1) reiser4 panicked cowardly: reiser4[patch(9302)]: reiser4_set_page_dirty_internal (fs/reiser4/page_cache.c:475)[]: assertion failed: JF_ISSET(jprivate(page), JNODE_DIRTY) Kernel panic - not syncing: reiser4[patch(9302)]: reiser4_set_page_dirty_interna l (fs/reiser4/page_cache.c:475)[]: assertion failed: JF_ISSET(jprivate(page), JNODE_DIRTY) Andrew Wade
assertion failed: keyeq(znode_get_rd_key(node), znode_get_ld_key(node-right))
I looked at this one for a bit; I couldn't make any headway. I don't fully understand what the debugging code for the delimiting keys is doing. (2.6.18-rc4-mm1) reiser4 panicked cowardly: reiser4[ent:hdb1!(2167)]: sibling_list_remove (fs/reiser4/tree_walk.c:814)[zam-32245]: assertion failed: keyeq(znode_get_rd_key(node), znode_get_ld_key(node-right)) Kernel panic - not syncing: reiser4[ent:hdb1!(2167)]: sibling_list_remove (fs/reiser4/tree_walk.c:814)[zam-32245]: assertion failed: keyeq(znode_get_rd_key(node), znode_get_ld_key(node-right)) (2.6.18-rc4-mm1) reiser4 panicked cowardly: reiser4[ent:hdb1!(2175)]: sibling_list_remove (fs/reiser4/tree_walk.c:814)[zam-32245]: assertion failed: keyeq(znode_get_rd_key(node), znode_get_ld_key(node-right)) [c0103754] dump_trace+0x64/0x181 [c0103883] show_trace_log_lvl+0x12/0x25 [c0103b79] show_trace+0xd/0x10 [c0103b93] dump_stack+0x17/0x19 [c01a0663] reiser4_do_panic+0x4e/0x7b [c01ee6bd] sibling_list_remove+0x85/0x52e [c01ba97d] forget_znode+0x22b/0x33b [c01b76e0] longterm_unlock_znode+0x268/0x723 [c01da260] handle_pos_on_formatted+0x35c/0x45f [c01da3fc] handle_pos_on_leaf+0x4d/0x61 [c01d6a84] squalloc+0x16/0x52 [c01d89f7] jnode_flush+0x80e/0x99d [c01d8fee] flush_current_atom+0x468/0x722 [c01cf073] flush_some_atom+0x9c3/0xb13 [c01f4216] reiser4_writeout+0x1a6/0x30c [c01f554b] entd+0x1e2/0x3d5 [c0124545] kthread+0xaf/0xde [c03eb415] kernel_thread_helper+0x5/0xb DWARF2 unwinder stuck at kernel_thread_helper+0x5/0xb Leftover inexact backtrace: [c0103883] show_trace_log_lvl+0x12/0x25 [c0103b79] show_trace+0xd/0x10 [c0103b93] dump_stack+0x17/0x19 [c01a0663] reiser4_do_panic+0x4e/0x7b [c01ee6bd] sibling_list_remove+0x85/0x52e [c01ba97d] forget_znode+0x22b/0x33b [c01b76e0] longterm_unlock_znode+0x268/0x723 [c01da260] handle_pos_on_formatted+0x35c/0x45f [c01da3fc] handle_pos_on_leaf+0x4d/0x61 [c01d6a84] squalloc+0x16/0x52 [c01d89f7] jnode_flush+0x80e/0x99d [c01d8fee] flush_current_atom+0x468/0x722 [c01cf073] flush_some_atom+0x9c3/0xb13 [c01f4216] reiser4_writeout+0x1a6/0x30c [c01f554b] entd+0x1e2/0x3d5 [c0124545] kthread+0xaf/0xde [c03eb415] kernel_thread_helper+0x5/0xb === : jnode: 0, tree: 1 (r:0,w:1), dk: 1 (r:0,w:1) jload: 0, txnh: 0, atom: 0, stack: 0, txnmgr: 0, ktxnmgrd: 0, fq: 0 inode: 0, cbk_cache: 0 (r:0,w0), eflush: 0, zlock: 1, spin: 3, long: 1 inode_sem: (r:0,w:0) d: 1, x: 4, t: -1 Kernel panic - not syncing: reiser4[ent:hdb1!(2175)]: sibling_list_remove (fs/reiser4/tree_walk.c:814)[zam-32245]: assertion failed: keyeq(znode_get_rd_key(node), znode_get_ld_key(node-right)) Andrew Wade
[nikita-1936] assertion failed: reiser4_no_counters_are_held()
This one has only occurred once. I looked fairly carefully at the code for partially converted files under the assumption that the rest was unlikely to be buggy, but nothing stood out at me. reiser4 panicked cowardly: reiser4[fixdep(19237)]: reiser4_done_context (fs/reiser4/context.c:181)[nikita-1936]: assertion failed: reiser4_no_counters_are_held() [c0103754] dump_trace+0x64/0x181 [c0103883] show_trace_log_lvl+0x12/0x25 [c0103b79] show_trace+0xd/0x10 [c0103b93] dump_stack+0x17/0x19 [c01a0663] reiser4_do_panic+0x4e/0x7b [c01bdbc0] reiser4_exit_context+0xa1/0x575 [c0202bc9] release_unix_file+0x1b7/0x1c2 [c014f90b] __fput+0xbe/0x16c [c014f9e7] fput+0x2e/0x33 [c014d3ec] filp_close+0x51/0x5b [c014ddd2] sys_close+0x70/0x93 [c01028a5] sysenter_past_esp+0x56/0x8d DWARF2 unwinder stuck at sysenter_past_esp+0x56/0x8d Leftover inexact backtrace: [c0103883] show_trace_log_lvl+0x12/0x25 [c0103b79] show_trace+0xd/0x10 [c0103b93] dump_stack+0x17/0x19 [c01a0663] reiser4_do_panic+0x4e/0x7b [c01bdbc0] reiser4_exit_context+0xa1/0x575 [c0202bc9] release_unix_file+0x1b7/0x1c2 [c014f90b] __fput+0xbe/0x16c [c014f9e7] fput+0x2e/0x33 [c014d3ec] filp_close+0x51/0x5b [c014ddd2] sys_close+0x70/0x93 [c01028a5] sysenter_past_esp+0x56/0x8d === : jnode: 0, tree: 0 (r:0,w:0), dk: 0 (r:0,w:0) jload: 0, txnh: 0, atom: 0, stack: 0, txnmgr: 0, ktxnmgrd: 0, fq: 0 inode: 0, cbk_cache: 0 (r:0,w0), eflush: 0, zlock: 0, spin: 0, long: 0 inode_sem: (r:0,w:0) d: 1, x: -2, t: -2 Kernel panic - not syncing: reiser4[fixdep(19237)]: reiser4_done_context (fs/reiser4/context.c:181)[nikita-1936]: assertion failed: reiser4_no_counters_are_held() I should be looking for an un-zrelse'd znode for this bug, correct? Andrew Wade
Re: [nikita-3002]: assertion failed: carry_level_invariant(doing, CARRY_DOING)
On Wednesday 16 August 2006 09:32, Benjamin Vander Jagt wrote: I am having the exact same problems but with one difference. After a while, the drive starts thrashing, and the system becomes totally unresponsive. I've been getting occasional short freezes of a couple of minutes. But that's probably unrelated: as I have debugging turned on and am deliberately stressing the fs poor performance is not unexpected. ... Andrew, may I ask for the contents of your /proc/meminfo file? Sure: MemTotal: 512648 kB MemFree: 70612 kB Buffers: 2800 kB Cached: 105236 kB SwapCached: 33812 kB Active: 335988 kB Inactive:63028 kB SwapTotal: 9791608 kB SwapFree: 9757768 kB Dirty: 84 kB Writeback: 0 kB AnonPages: 267960 kB Mapped: 52792 kB Slab:22760 kB PageTables: 3148 kB NFS Unstable:0 kB Bounce: 0 kB CommitLimit: 10047932 kB Committed_AS: 686084 kB VmallocTotal: 515796 kB VmallocUsed: 25572 kB VmallocChunk: 489680 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 4096 kB I am currently trying to distill a test-case for crashing the fs. It is going slowly, but I have managed to provoke a few panics, including some new ones: reiser4 panicked cowardly: reiser4[scatteredwrites(4506)]: reiser4_writepage (fs/reiser4/page_cache.c:522)[]: assertion failed: can_hit_entd(ctx, s) Kernel panic - not syncing: reiser4[scatteredwrites(4506)]: reiser4_writepage (fs/reiser4/page_cache.c:522)[]: assertion failed: can_hit_entd(ctx, s) reiser4 panicked cowardly: reiser4[tar(4238)]: reiser4_update_extent (fs/reiser4/plugin/item/extent_file_ops.c:807)[]: assertion failed: reiser4_lock_counters()-d_refs == 0 Kernel panic - not syncing: reiser4[tar(4238)]: reiser4_update_extent (fs/reiser4/plugin/item/extent_file_ops.c:807)[]: assertion failed: reiser4_lock_counters()-d_refs == 0 reiser4 panicked cowardly: reiser4[patch(9302)]: reiser4_set_page_dirty_internal (fs/reiser4/page_cache.c:475)[]: assertion failed: JF_ISSET(jprivate(page), JNODE_DIRTY) Kernel panic - not syncing: reiser4[patch(9302)]: reiser4_set_page_dirty_internal (fs/reiser4/page_cache.c:475)[]: assertion failed: JF_ISSET(jprivate(page), JNODE_DIRTY) These are all for 2.6.18-rc4-mm1 + the small patch upthread. Andrew Wade
Re: [nikita-3002]: assertion failed: carry_level_invariant(doing, CARRY_DOING)
On Friday 11 August 2006 05:15, Vladimir V. Saveliev wrote: Hello On Thursday 10 August 2006 21:55, Andrew James Wade wrote: Hello, I've had another panic on a fscked filesystem: reiser4 panicked cowardly: reiser4[updatedb(3302)]: reiser4_writepage (fs/reiser4/page_cache.c:521)[]: assertion failed: can_hit_entd(ctx, s) Kernel panic - not syncing: reiser4[updatedb(3302)]: reiser4_writepage (fs/reiser4/page_cache.c:521)[]: assertion failed: can_hit_entd(ctx, s) What kernel do you use? Recently we had few fixes of such problem. 2.6.18-rc3-mm2 + the patch below. I've been unable to observe any corruption in over 300 GB of file data written to the hd, so I don't think I have a hardware issue. I will continue poking away at the problem. Andrew Wade -- re-add to reiser4_releasepage mistakenly removed page_count check. extra page reference is used to protect page from detaching from the jnode. Signed-off-by: Alexander Zarochentsev [EMAIL PROTECTED] --- fs/reiser4/as_ops.c |5 + 1 file changed, 5 insertions(+) Index: linux-2.6-git/fs/reiser4/as_ops.c === --- linux-2.6-git.orig/fs/reiser4/as_ops.c +++ linux-2.6-git/fs/reiser4/as_ops.c @@ -350,6 +350,11 @@ int reiser4_releasepage(struct page *pag if (PageDirty(page)) return 0; + /* extra page reference is used by reiser4 to protect +* jnode-page link from this -releasepage(). */ + if (page_count(page) 3) + return 0; + /* releasable() needs jnode lock, because it looks at the jnode fields * and we need jload_lock here to avoid races with jload(). */ spin_lock_jnode(node);
Re: [nikita-3002]: assertion failed: carry_level_invariant(doing, CARRY_DOING)
Hello, I've had another panic on a fscked filesystem: reiser4 panicked cowardly: reiser4[updatedb(3302)]: reiser4_writepage (fs/reiser4/page_cache.c:521)[]: assertion failed: can_hit_entd(ctx, s) Kernel panic - not syncing: reiser4[updatedb(3302)]: reiser4_writepage (fs/reiser4/page_cache.c:521)[]: assertion failed: can_hit_entd(ctx, s) It's getting pretty obvious that there must be something unusual/unique in my setup that's giving me grief. My guess would be that data is getting corrupted going between the drive and memory. I do have my pci bus underclocked to 30 MHz so maybe that's a factor. I have had problems with memory corruption in the past (hence the underclocking), but I haven't had any of the symptoms of memory corruption re-appearing. (Note that /dev/hdb is my /home filesystem only, so it's plausible that problems there would mostly tickle reiser4 code). If that's what is going on, I would expect file contents to also corrupt. I'm going to whip up some scripts to exercise the reading and writing large amounts of data to the disk and and see if I can find corruption of the data. (I hope to be able to use O_DIRECT to avoid thrashing). I suppose another possibility is that there is something strange in my filesystem that survives fsck, but causes problems. Given the variety of symptoms (and the lack of other reports) I would tend to discount that though. For the record this is what fsck keeps telling me: FSCK: Node (33160105), item (0), [29:1(SD):0:2a:0]: the slot (9) contains the invalid opset member (compress mode), id (2). FSCK: Node (33160105), item (0), [29:1(SD):0:2a:0]: removing broken slots. FSCK: Node (33160105), item (0), [29:1(SD):0:2a:0]: item has the wrong length (94). Should be (90). Fixed. I'm going to run fsck twice in a row to verify that fsck fixes the problems, but I'm working under the assumption that what fsck is finding is unrelated. I think the ball is in my court: fortunately I now have time to devote to investigation. I'll let you know what I find. Comments? Andrew Wade
Re: [nikita-3002]: assertion failed: carry_level_invariant(doing, CARRY_DOING)
Hello, I have had another assertion fail. This one is with 2.6.18-rc2-mm1 + the fix in reiser4_releasepage. This was on a filesystem that had not been unmounted cleanly. (2.6.18-rc3-mm1 crashed on me). reiser4 panicked cowardly: reiser4[ktxnmgrd:hdb1:r(1977)]: sibling_list_remove (fs/reiser4/tree_walk.c:813)[zam-32245]: assertion failed: keyeq(znode_get_rd_key(node), znode_get_ld_key(node-right)) Kernel panic - not syncing: reiser4[ktxnmgrd:hdb1:r(1977)]: sibling_list_remove (fs/reiser4/tree_walk.c:813)[zam-32245]: assertion failed: keyeq(znode_get_rd_key(node), znode_get_ld_key(node-right)) The next boot had this diagnostic: reiser4[kde-config(3707)]: present_lw_sd (fs/reiser4/plugin/item/static_stat.c:276)[]: WARNING: partially converted file is encountered and is continuing to work fine. I have not yet fscked the filesystem. I hope this helps, Andrew Wade
Re: [nikita-3002]: assertion failed: carry_level_invariant(doing, CARRY_DOING)
I've just had some warnings show up in my kernel log. I don't know if they're related to the troubles I've been having (I fscked after the last panic). reiser4[updatedb(32445)]: key_warning (fs/reiser4/plugin/file_plugin_common.c:513)[nikita-717]: WARNING: Error for inode 401698 (-2) for key: (6211c:1:656e646f727365:0:62122:0)[stat data] reiser4[updatedb(32445)]: key_warning (fs/reiser4/plugin/file_plugin_common.c:513)[nikita-717]: WARNING: Error for inode 401697 (-2) for key: (6211c:1:6576656e74732e:0:62121:0)[stat data] reiser4[updatedb(32445)]: key_warning (fs/reiser4/plugin/file_plugin_common.c:513)[nikita-717]: WARNING: Error for inode 401694 (-2) for key: (6211c:1:736e617073686f:0:6211e:0)[stat data] reiser4[updatedb(32445)]: key_warning (fs/reiser4/plugin/file_plugin_common.c:513)[nikita-717]: WARNING: Error for inode 401699 (-2) for key: (6211c:1:776f6d656e5f6c:0:62123:0)[stat data] Hope this helps, Andrew Wade
Re: [nikita-3002]: assertion failed: carry_level_invariant(doing, CARRY_DOING)
Thanks. I've applied the patch, and I'll let you know if any errors reccur. Andrew Wade
[nikita-3002]: assertion failed: carry_level_invariant(doing, CARRY_DOING)
Hello, Every few weeks reiser4 panics on me, generally while kmail is receiving emails. Until recently, the panic was invalid opcode: [#1] (previously reported), but I have some new errors: The first is: reiser4 panicked cowardly: reiser4[less(7234)]: set_file_state (fs/reiser4/plugin/file/file.c:200)[vs-1162]: assertion failed: ergo(level == LEAF_LEVEL cbk_result == CBK_COORD_FOUND, uf_info-container == UF_CONTAINER_TAILS) Kernel panic - not syncing: reiser4[less(7234)]: set_file_state (fs/reiser4/plugin/file/file.c:200)[vs-1162]: assertion failed: ergo(level == LEAF_LEVEL cbk_result == CBK_COORD_FOUND, uf_info-container == UF_CONTAINER_TAILS) and the second is: reiser4[patch(25956)]: carry_level_invariant (fs/reiser4/carry.c:1250)[]: WARNING: wrong key order reiser4 panicked cowardly: reiser4[patch(25956)]: carry_on_level (fs/reiser4/carry.c:356)[nikita-3002]: assertion failed: carry_level_invariant(doing, CARRY_DOING) Kernel panic - not syncing: reiser4[patch(25956)]: carry_on_level (fs/reiser4/carry.c:356)[nikita-3002]: assertion failed: carry_level_invariant(doing, CARRY_DOING) Both were for 2.6.18-rc2-mm1 [1]. The second error occurred on a recently fscked filesystem. [1] with one patch reverted for unrelated reasons. I hope this helps get to the root of the problem. Unfortunately, I do not yet have a reproduceable test case. Andrew Wade
Possible circular locking dependency detected in Reiser4
Hello, I got the following warning when I ran klive: Andrew Wade === [ INFO: possible circular locking dependency detected ] --- twistd/3816 is trying to acquire lock: (txnh-hlock){--..}, at: [txn_end+1011/1139] txn_end+0x3f3/0x473 but task is already holding lock: (atom-alock){--..}, at: [txnh_get_atom+28/120] txnh_get_atom+0x1c/0x78 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: - #1 (atom-alock){--..}: [lock_acquire+94/129] lock_acquire+0x5e/0x81 [_spin_lock+35/50] _spin_lock+0x23/0x32 [try_capture+733/2499] try_capture+0x2dd/0x9c3 [longterm_lock_znode+755/1026] longterm_lock_znode+0x2f3/0x402 [seal_validate+82/288] seal_validate+0x52/0x120 [write_sd_by_inode_common+659/1328] write_sd_by_inode_common+0x293/0x530 [reiser4_update_sd+37/44] reiser4_update_sd+0x25/0x2c [reiser4_dirty_inode+23/112] reiser4_dirty_inode+0x17/0x70 [__mark_inode_dirty+41/353] __mark_inode_dirty+0x29/0x161 [inode_setattr+345/355] inode_setattr+0x159/0x163 [setattr_common+86/131] setattr_common+0x56/0x83 [setattr_unix_file+493/507] setattr_unix_file+0x1ed/0x1fb [notify_change+260/533] notify_change+0x104/0x215 [sys_fchmodat+151/190] sys_fchmodat+0x97/0xbe [sys_chmod+18/20] sys_chmod+0x12/0x14 [sysenter_past_esp+86/141] sysenter_past_esp+0x56/0x8d - #0 (txnh-hlock){--..}: [lock_acquire+94/129] lock_acquire+0x5e/0x81 [_spin_lock+35/50] _spin_lock+0x23/0x32 [txn_end+1011/1139] txn_end+0x3f3/0x473 [reiser4_exit_context+172/287] reiser4_exit_context+0xac/0x11f [setattr_common+123/131] setattr_common+0x7b/0x83 [setattr_unix_file+493/507] setattr_unix_file+0x1ed/0x1fb [notify_change+260/533] notify_change+0x104/0x215 [sys_fchmodat+151/190] sys_fchmodat+0x97/0xbe [sys_chmod+18/20] sys_chmod+0x12/0x14 [sysenter_past_esp+86/141] sysenter_past_esp+0x56/0x8d other info that might help us debug this: 2 locks held by twistd/3816: #0: (inode-i_mutex){--..}, at: [mutex_lock+8/10] mutex_lock+0x8/0xa #1: (atom-alock){--..}, at: [txnh_get_atom+28/120] txnh_get_atom+0x1c/0x78 stack backtrace: [show_trace_log_lvl+84/253] show_trace_log_lvl+0x54/0xfd [show_trace+13/16] show_trace+0xd/0x10 [dump_stack+23/25] dump_stack+0x17/0x19 [print_circular_bug_tail+89/100] print_circular_bug_tail+0x59/0x64 [__lock_acquire+2084/2524] __lock_acquire+0x824/0x9dc [lock_acquire+94/129] lock_acquire+0x5e/0x81 [_spin_lock+35/50] _spin_lock+0x23/0x32 [txn_end+1011/1139] txn_end+0x3f3/0x473 [reiser4_exit_context+172/287] reiser4_exit_context+0xac/0x11f [setattr_common+123/131] setattr_common+0x7b/0x83 [setattr_unix_file+493/507] setattr_unix_file+0x1ed/0x1fb [notify_change+260/533] notify_change+0x104/0x215 [sys_fchmodat+151/190] sys_fchmodat+0x97/0xbe [sys_chmod+18/20] sys_chmod+0x12/0x14 [sysenter_past_esp+86/141] sysenter_past_esp+0x56/0x8d
Re: reiser4 bug in 2.6.16-rc2-mm1
On Friday 10 February 2006 09:22, Maarten Deprez wrote: Hello, reiser4 on linux 2.6.16-rc2-mm1 bugs for me in plugins/file/tail_conversion.c line 29, locking up a process sometimes, when it is reading a file. Greetings, Maarten Deprez Still present in 2.6.16-rc3-mm1: [ cut here ] kernel BUG at fs/reiser4/plugin/file/tail_conversion.c:81! invalid opcode: [#1] PREEMPT last sysfs file: /devices/pci:00/:00:01.0/:01:00.0/i2c-0/name CPU:0 EIP:0060:[get_nonexclusive_access+30/49]Not tainted VLI EFLAGS: 00010286 (2.6.16-rc3-mm1 #2) EIP is at get_nonexclusive_access+0x1e/0x31 eax: cc2ec288 ebx: ecx: cb87d4e8 edx: esi: cb87d4e8 edi: ebp: d30ba574 esp: d3934e00 ds: 007b es: 007b ss: 0068 Process kmail (pid: 21299, threadinfo=d3934000 task=d8577570) Stack: 0c01ca9ae d94d157c d3934ed8 cb87d540 000f 00320af1 df136ef4 f000 d20b2494 1000 d94d158c 0127 43f543c8 2ac0f373 d94d158c Call Trace: c01ca9ae write_extent+0x68d/0xbc3 c01cd0e2 item_length_by_coord+0xb/0xf c01c8125 nr_units_extent+0x5/0xd c01c94ef init_coord_extension_extent+0x60/0xdf c01b5031 set_file_state+0x26/0x5b c01b5128 find_file_item+0xc2/0xd4 c01ca321 write_extent+0x0/0xbc3 c01b6c46 write_flow+0x248/0x2df c01b74bc write_unix_file+0x343/0x4cc c01345f6 lru_cache_add_active+0x47/0x5d c01b7179 write_unix_file+0x0/0x4cc c01488f5 vfs_write+0x83/0x122 c014910e sys_write+0x3c/0x63 c0102ac7 sysenter_past_esp+0x54/0x75 Code: 81 c4 b0 00 00 00 89 e8 5b 5e 5f 5d c3 85 d2 89 c1 75 20 b8 00 f0 ff ff 21 e0 8b 00 8b 80 c4 04 00 00 8b 40 40 83 78 08 00 74 08 0f 0b 51 00 b8 b8 38 c0 89 c8 ff 00 0f 88 e1 06 00 00 c3 55 ba 44reiser4[kmail(21299)]: release_unix_file (fs/reiser4/plugin/file/file.c:2674)[vs-44]: WARNING: out of memory? 4reiser4[kmail(21299)]: release_unix_file (fs/reiser4/plugin/file/file.c:2674)[vs-44]: WARNING: out of memory? ...
Re: Unexpected reset corrupted Reiser4 filesystem
John Dong wrote: If thse were IDE drives, the IDE writeback cache is probably the bad boy -- on FreeBSD 5.x, Soft Updates is virtually broken on IDE drives because they simply haven't written all the data they promised the kernel that they had. I do indeed have an IDE drive (Seagate Barracuda) with a writeback cache. But I thought that write barriers were now working (by flushing the writeback cache if the drive doesn't support anything fancier). However, I couldn't find any updates on the write barrier work since March of last year. (http://lwn.net/Articles/77074/). So the writeback cache may indeed be the bad boy. On May 25, 2005 12:49 am, David Masover wrote: That's what it all comes down to -- make backups. The fact that you have journalling/transactions/fsck/batteries/RAID is all just to make it a little less catostrophic when stuff does fail. Yup, time for me to make backups. Thanks for the nudge. Especially as I'm running a bleeding-edge kernel. (I did have one eat some of my data). Andrew P.S. My internet connection's been flaky lately, so apologies for any bounces. I check the mailing list archives for missed messages.
Unexpected reset corrupted Reiser4 filesystem
Hello, One of my Reiser4 filesystems was corrupted by a power glitch. fsck fixed the corruption, but my understanding is that an unexpected reset should not have corrupted the filesystem. I have an image of the corrupted filesystem, is it of any use to anyone? Details: kernel: 2.6.12-rc4-mm2 fsck.reiser4: 1.0.4 I was installing oracle, when the power flickered. I was unable to delete oracle's directory due to what was reported as I/O errors. fsck revealed a corrupted filesystem (FSCK: Node (13142228), item (77), unit (0): Points to the block (12981542) which is in the tree already. The whole subtree is skipped.) I have an image of the partition at this point, and dd reported no errors while copying. The image is unfortunately a bit large to upload (70 GB), but I am happy to run diagnostic tools against it. Andrew Wade