Andreas...

Andreas Dilger wrote:
> On 2010-03-17, at 02:59, Gregory Matthews wrote:
>> Gregory Matthews wrote:
>>> BUG: using smp_processor_id() in preemptible [00000000] code: 
>>> modprobe/6024
>>> caller is set_ptldebug_header+0x41/0xa0 [libcfs]
>>> Pid: 6024, comm: modprobe Not tainted 2.6.27.39-default #2
>>>
>>> Call Trace:
>>>  [<ffffffff80313c2b>] debug_smp_processor_id+0xd3/0xe8
>>>  [<ffffffffa0591171>] set_ptldebug_header+0x41/0xa0 [libcfs]
>>>  [<ffffffffa0599bb0>] libcfs_debug_vmsg2+0x70/0x990 [libcfs]
>>>  [<ffffffff802514c2>] smp_call_function+0x3f/0x5e
>>>  [<ffffffff80254e9a>] load_module+0x166f/0x176e
>>>  [<ffffffffa014c000>] init_obdclass+0x0/0x3e4 [obdclass]
>>>  [<ffffffffa014c04b>] init_obdclass+0x4b/0x3e4 [obdclass]
>>>  [<ffffffff80209041>] _stext+0x41/0x110
>>>  [<ffffffff80255037>] sys_init_module+0x9e/0x1ab
>>>  [<ffffffff8020bf8b>] system_call_fastpath+0x16/0x1b

> In lnet/libcfs/tracefile.c::ibcfs_debug_vmsg2() you could try moving 
> set_ptldebug_header() after the call to trace_get_tcd(), which should 
> pin the thread to the CPU by disabling preempt and stop the warning.

thanks for the advice. I've compiled new lustre packages with the 
suggested fix and am currently running racer. So far, the above bug has 
not reappeared but there is something similar:

Mar 18 10:34:57 i15-pilatus1 kernel: BUG: using smp_processor_id() in 
preemptible [00000000] code: syslog-ng/2332
Mar 18 10:34:57 i15-pilatus1 kernel: caller is wake_up_klogd+0x27/0x3d
Mar 18 10:34:57 i15-pilatus1 kernel: Pid: 2332, comm: syslog-ng Not 
tainted 2.6.27.39-default #2
Mar 18 10:34:57 i15-pilatus1 kernel:
Mar 18 10:34:57 i15-pilatus1 kernel: Call Trace:
Mar 18 10:34:57 i15-pilatus1 kernel:  [<ffffffff80313c2b>] 
debug_smp_processor_id+0xd3/0xe8
Mar 18 10:34:57 i15-pilatus1 kernel:  [<ffffffff80234af4>] 
wake_up_klogd+0x27/0x3d
Mar 18 10:34:57 i15-pilatus1 kernel:  [<ffffffff80366a55>] 
write_chan+0x25a/0x2ef
Mar 18 10:34:57 i15-pilatus1 kernel:  [<ffffffff8022bf45>] 
default_wake_function+0x0/0xe
Mar 18 10:34:57 i15-pilatus1 kernel:  [<ffffffff80364bfa>] 
tty_write+0x191/0x227
Mar 18 10:34:57 i15-pilatus1 kernel:  [<ffffffff803667fb>] 
write_chan+0x0/0x2ef
Mar 18 10:34:57 i15-pilatus1 kernel:  [<ffffffff80294b00>] 
vfs_write+0xad/0x136
Mar 18 10:34:57 i15-pilatus1 kernel:  [<ffffffff80294f86>] 
sys_write+0x45/0x6e
Mar 18 10:34:57 i15-pilatus1 kernel:  [<ffffffff8020bf8b>] 
system_call_fastpath+0x16/0x1b

wake_up_klogd looks like a kernel function from kernel/printk.c so 
perhaps this is /now/ kernel bug 12518. In the meantime, lustre just got 
more compatible with the preempt kernel. Will this fix make it into 
mainline?

My diff is:

--- lnet/libcfs/tracefile.c.orig        2010-03-18 08:45:53.000000000 +0000
+++ lnet/libcfs/tracefile.c     2010-03-18 08:50:09.000000000 +0000
@@ -258,10 +258,13 @@
          if (strchr(file, '/'))
                  file = strrchr(file, '/') + 1;

+        tcd = trace_get_tcd();

+       /* suggested fix for stack traces in syslog, proposed by Andreas
+        * 
http://groups.google.com/group/lustre-discuss-list/browse_thread/thread/903cbf9ac251db33
+        */
          set_ptldebug_header(&header, subsys, mask, line, CDEBUG_STACK());

-        tcd = trace_get_tcd();
          if (tcd == NULL)                /* arch may not log in IRQ 
context */
                  goto console;


GREG

> 
> Cheers, Andreas
> -- 
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
> 
> 


-- 
Greg Matthews            01235 778658
Senior Computer Systems Administrator
Diamond Light Source, Oxfordshire, UK
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to