Re: "BUG: held lock freed!" lock validator tripped by kswapd & xfs

2006-12-03 Thread David Chinner
On Fri, Dec 01, 2006 at 02:34:42PM -0800, Stephen Pollei wrote:
> On 12/1/06, Mike Mattie <[EMAIL PROTECTED]> wrote:
> 
> >In an attempt to debug another kernel issue I turned on the lock validator 
> >and
> >managed to generate this report.
> >
> >As a side note the first attempt to boot with the lock validator failed 
> >with
> >a message indicating I had exceeded MAX_LOCK_DEPTH. To get this trace
> >I patched sched.h: MAX_LOCK_DEPTH to 60.
> >
> >Dec  1 08:35:41 reforged [ 3052.513931] =
> >Dec  1 08:35:41 reforged [ 3052.513937] [ BUG: held lock freed! ]
> >Dec  1 08:35:41 reforged [ 3052.513939] -
> >Dec  1 08:35:41 reforged [ 3052.513943] kswapd0/183 is freeing memory
> >c3458000-c3458fff, with a lock still held there! Dec  1 08:35:41
> >reforged [ 3052.513947]  (&(>i_iolock)->mr_lock){}, at:
> >[] xfs_ilock+0x20/0x75 Dec  1 08:35:41 reforged
> >[ 3052.513959] 28 locks held by kswapd0/183: Dec  1 08:35:41 reforged
> >[ 3052.513961]  #0:  (&(>i_iolock)->mr_lock){}, at:
> >[] xfs_ilock+0x20/0x75 Dec  1 08:35:41 reforged
> >[ 3052.513968]  #1:  (&(>i_lock)->mr_lock){}, at: []
> >xfs_ilock+0x52/0x75 Dec  1 08:35:41 reforged [ 3052.513975]
> 
> seems to alternate between same two locks. But both c089 and
> c0bb are not between the page(oxfff=4095 or about 4k) which kswapd
> is trying to get rid of.
> I think this trace is on crack somehow.

IIRC, lockdep doesn't understand the xfs inode locks yet. We've
got a patch to fix most of this, but I don't think it's been merged.

Cheers,

Dave
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: held lock freed! lock validator tripped by kswapd xfs

2006-12-03 Thread David Chinner
On Fri, Dec 01, 2006 at 02:34:42PM -0800, Stephen Pollei wrote:
 On 12/1/06, Mike Mattie [EMAIL PROTECTED] wrote:
 
 In an attempt to debug another kernel issue I turned on the lock validator 
 and
 managed to generate this report.
 
 As a side note the first attempt to boot with the lock validator failed 
 with
 a message indicating I had exceeded MAX_LOCK_DEPTH. To get this trace
 I patched sched.h: MAX_LOCK_DEPTH to 60.
 
 Dec  1 08:35:41 reforged [ 3052.513931] =
 Dec  1 08:35:41 reforged [ 3052.513937] [ BUG: held lock freed! ]
 Dec  1 08:35:41 reforged [ 3052.513939] -
 Dec  1 08:35:41 reforged [ 3052.513943] kswapd0/183 is freeing memory
 c3458000-c3458fff, with a lock still held there! Dec  1 08:35:41
 reforged [ 3052.513947]  ((ip-i_iolock)-mr_lock){}, at:
 [c089] xfs_ilock+0x20/0x75 Dec  1 08:35:41 reforged
 [ 3052.513959] 28 locks held by kswapd0/183: Dec  1 08:35:41 reforged
 [ 3052.513961]  #0:  ((ip-i_iolock)-mr_lock){}, at:
 [c089] xfs_ilock+0x20/0x75 Dec  1 08:35:41 reforged
 [ 3052.513968]  #1:  ((ip-i_lock)-mr_lock){}, at: [c0bb]
 xfs_ilock+0x52/0x75 Dec  1 08:35:41 reforged [ 3052.513975]
 
 seems to alternate between same two locks. But both c089 and
 c0bb are not between the page(oxfff=4095 or about 4k) which kswapd
 is trying to get rid of.
 I think this trace is on crack somehow.

IIRC, lockdep doesn't understand the xfs inode locks yet. We've
got a patch to fix most of this, but I don't think it's been merged.

Cheers,

Dave
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: "BUG: held lock freed!" lock validator tripped by kswapd & xfs

2006-12-01 Thread Stephen Pollei

On 12/1/06, Mike Mattie <[EMAIL PROTECTED]> wrote:


In an attempt to debug another kernel issue I turned on the lock validator and
managed to generate this report.

As a side note the first attempt to boot with the lock validator failed with
a message indicating I had exceeded MAX_LOCK_DEPTH. To get this trace
I patched sched.h: MAX_LOCK_DEPTH to 60.

Dec  1 08:35:41 reforged [ 3052.513931] =
Dec  1 08:35:41 reforged [ 3052.513937] [ BUG: held lock freed! ]
Dec  1 08:35:41 reforged [ 3052.513939] -
Dec  1 08:35:41 reforged [ 3052.513943] kswapd0/183 is freeing memory
c3458000-c3458fff, with a lock still held there! Dec  1 08:35:41
reforged [ 3052.513947]  (&(>i_iolock)->mr_lock){}, at:
[] xfs_ilock+0x20/0x75 Dec  1 08:35:41 reforged
[ 3052.513959] 28 locks held by kswapd0/183: Dec  1 08:35:41 reforged
[ 3052.513961]  #0:  (&(>i_iolock)->mr_lock){}, at:
[] xfs_ilock+0x20/0x75 Dec  1 08:35:41 reforged
[ 3052.513968]  #1:  (&(>i_lock)->mr_lock){}, at: []
xfs_ilock+0x52/0x75 Dec  1 08:35:41 reforged [ 3052.513975]


seems to alternate between same two locks. But both c089 and
c0bb are not between the page(oxfff=4095 or about 4k) which kswapd
is trying to get rid of.
I think this trace is on crack somehow.


[ 3052.514136] stack backtrace: Dec  1 08:35:41 reforged
[ 3052.514139]  [] show_trace+0x16/0x19 Dec  1 08:35:41
reforged [ 3052.514146]  [] dump_stack+0x1a/0x1f Dec  1
08:35:41 reforged [ 3052.514150]  []
debug_check_no_locks_freed+0xe0/0xff Dec  1 08:35:41 reforged
[ 3052.514159]  [] free_hot_cold_page+0x96/0x109 Dec  1
08:35:41 reforged [ 3052.514166]  [] __pagevec_free+0x1c/0x27
Dec  1 08:35:41 reforged [ 3052.514170]  []
__pagevec_release_nonlru+0x65/0x71 Dec  1 08:35:41 reforged
[ 3052.514176]  [] shrink_inactive_list+0x4b1/0x722 Dec  1
08:35:41 reforged [ 3052.514181]  [] shrink_zone+0xba/0xd9
Dec  1 08:35:41 reforged [ 3052.514185]  []
kswapd+0x26a/0x361 Dec  1 08:35:41 reforged [ 3052.514189]
[] kthread+0xb0/0xe1 Dec  1 08:35:41 reforged [ 3052.514192]
[] kernel_thread_helper+0x5/0xb reforged log #




Linux reforged 2.6.18.3 #4 PREEMPT Fri Dec 1 06:15:05 PST 2006 i686 AMD 
Athlon(tm) XP 3000+ AuthenticAMD GNU/Linux


I know you are running preempt on up machine. I'd try running 2.6.18.4
with a small patch like this and see if you can't cause it to recrash
for you. print_freed_lock_bug uses printk which in theory might be
causing a preempt .

diff -urp linux-2.6.18.4/include/linux/sched.h linux-debug/include/linux/sched.h
--- linux-2.6.18.4/include/linux/sched.h2006-11-29
11:28:40.0 -0800
+++ linux-debug/include/linux/sched.h   2006-12-01 13:25:23.0 -0800
@@ -936,7 +936,7 @@ struct task_struct {
   int softirq_context;
#endif
#ifdef CONFIG_LOCKDEP
-# define MAX_LOCK_DEPTH 30UL
+# define MAX_LOCK_DEPTH (60UL)
   u64 curr_chain_key;
   int lockdep_depth;
   struct held_lock held_locks[MAX_LOCK_DEPTH];
diff -urp linux-2.6.18.4/kernel/lockdep.c linux-debug/kernel/lockdep.c
--- linux-2.6.18.4/kernel/lockdep.c 2006-11-29 11:28:40.0 -0800
+++ linux-debug/kernel/lockdep.c2006-12-01 14:22:14.0 -0800
@@ -2608,6 +2608,7 @@ void debug_check_no_locks_freed(const vo
   return;

   local_irq_save(flags);
+   preempt_disable();
   for (i = 0; i < curr->lockdep_depth; i++) {
   hlock = curr->held_locks + i;

@@ -2621,6 +2622,7 @@ void debug_check_no_locks_freed(const vo
   print_freed_lock_bug(curr, mem_from, mem_to, hlock);
   break;
   }
+   preempt_enable();
   local_irq_restore(flags);
}


--
http://dmoz.org/profiles/pollei.html
http://sourceforge.net/users/stephen_pollei/
http://www.orkut.com/Profile.aspx?uid=2455954990164098214
http://stephen_pollei.home.comcast.net/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BUG: held lock freed! lock validator tripped by kswapd xfs

2006-12-01 Thread Stephen Pollei

On 12/1/06, Mike Mattie [EMAIL PROTECTED] wrote:


In an attempt to debug another kernel issue I turned on the lock validator and
managed to generate this report.

As a side note the first attempt to boot with the lock validator failed with
a message indicating I had exceeded MAX_LOCK_DEPTH. To get this trace
I patched sched.h: MAX_LOCK_DEPTH to 60.

Dec  1 08:35:41 reforged [ 3052.513931] =
Dec  1 08:35:41 reforged [ 3052.513937] [ BUG: held lock freed! ]
Dec  1 08:35:41 reforged [ 3052.513939] -
Dec  1 08:35:41 reforged [ 3052.513943] kswapd0/183 is freeing memory
c3458000-c3458fff, with a lock still held there! Dec  1 08:35:41
reforged [ 3052.513947]  ((ip-i_iolock)-mr_lock){}, at:
[c089] xfs_ilock+0x20/0x75 Dec  1 08:35:41 reforged
[ 3052.513959] 28 locks held by kswapd0/183: Dec  1 08:35:41 reforged
[ 3052.513961]  #0:  ((ip-i_iolock)-mr_lock){}, at:
[c089] xfs_ilock+0x20/0x75 Dec  1 08:35:41 reforged
[ 3052.513968]  #1:  ((ip-i_lock)-mr_lock){}, at: [c0bb]
xfs_ilock+0x52/0x75 Dec  1 08:35:41 reforged [ 3052.513975]


seems to alternate between same two locks. But both c089 and
c0bb are not between the page(oxfff=4095 or about 4k) which kswapd
is trying to get rid of.
I think this trace is on crack somehow.


[ 3052.514136] stack backtrace: Dec  1 08:35:41 reforged
[ 3052.514139]  [c0103cb9] show_trace+0x16/0x19 Dec  1 08:35:41
reforged [ 3052.514146]  [c01040f7] dump_stack+0x1a/0x1f Dec  1
08:35:41 reforged [ 3052.514150]  [c012be74]
debug_check_no_locks_freed+0xe0/0xff Dec  1 08:35:41 reforged
[ 3052.514159]  [c014122d] free_hot_cold_page+0x96/0x109 Dec  1
08:35:41 reforged [ 3052.514166]  [c01412bc] __pagevec_free+0x1c/0x27
Dec  1 08:35:41 reforged [ 3052.514170]  [c01435dc]
__pagevec_release_nonlru+0x65/0x71 Dec  1 08:35:41 reforged
[ 3052.514176]  [c0144702] shrink_inactive_list+0x4b1/0x722 Dec  1
08:35:41 reforged [ 3052.514181]  [c0144a2d] shrink_zone+0xba/0xd9
Dec  1 08:35:41 reforged [ 3052.514185]  [c0144e9e]
kswapd+0x26a/0x361 Dec  1 08:35:41 reforged [ 3052.514189]
[c012742b] kthread+0xb0/0xe1 Dec  1 08:35:41 reforged [ 3052.514192]
[c0101005] kernel_thread_helper+0x5/0xb reforged log #




Linux reforged 2.6.18.3 #4 PREEMPT Fri Dec 1 06:15:05 PST 2006 i686 AMD 
Athlon(tm) XP 3000+ AuthenticAMD GNU/Linux


I know you are running preempt on up machine. I'd try running 2.6.18.4
with a small patch like this and see if you can't cause it to recrash
for you. print_freed_lock_bug uses printk which in theory might be
causing a preempt .

diff -urp linux-2.6.18.4/include/linux/sched.h linux-debug/include/linux/sched.h
--- linux-2.6.18.4/include/linux/sched.h2006-11-29
11:28:40.0 -0800
+++ linux-debug/include/linux/sched.h   2006-12-01 13:25:23.0 -0800
@@ -936,7 +936,7 @@ struct task_struct {
   int softirq_context;
#endif
#ifdef CONFIG_LOCKDEP
-# define MAX_LOCK_DEPTH 30UL
+# define MAX_LOCK_DEPTH (60UL)
   u64 curr_chain_key;
   int lockdep_depth;
   struct held_lock held_locks[MAX_LOCK_DEPTH];
diff -urp linux-2.6.18.4/kernel/lockdep.c linux-debug/kernel/lockdep.c
--- linux-2.6.18.4/kernel/lockdep.c 2006-11-29 11:28:40.0 -0800
+++ linux-debug/kernel/lockdep.c2006-12-01 14:22:14.0 -0800
@@ -2608,6 +2608,7 @@ void debug_check_no_locks_freed(const vo
   return;

   local_irq_save(flags);
+   preempt_disable();
   for (i = 0; i  curr-lockdep_depth; i++) {
   hlock = curr-held_locks + i;

@@ -2621,6 +2622,7 @@ void debug_check_no_locks_freed(const vo
   print_freed_lock_bug(curr, mem_from, mem_to, hlock);
   break;
   }
+   preempt_enable();
   local_irq_restore(flags);
}


--
http://dmoz.org/profiles/pollei.html
http://sourceforge.net/users/stephen_pollei/
http://www.orkut.com/Profile.aspx?uid=2455954990164098214
http://stephen_pollei.home.comcast.net/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/