On Thu, Jul 12, 2012 at 4:05 PM, Dave Jones <da...@redhat.com> wrote: > On Thu, Jul 12, 2012 at 03:52:17PM +0200, Kay Sievers wrote: > > On Thu, Jul 12, 2012 at 2:54 AM, Dave Jones <da...@redhat.com> wrote: > > > On Mon, Jul 09, 2012 at 08:48:51PM +0200, Kay Sievers wrote: > > > > > > It looks fine here with the above mentioned patch: > > > > > > Now that that patch is in Linus tree, I've hit what's probably a > different case. > > > Look at the modules list in this oops.. > > > > > > [10016.460020] BUG: soft lockup - CPU#1 stuck for 22s! > [trinity-child1:24295] > > > [10016.470008] rose<4>[10016.470008] > ip_set_bitmap_ipmac<4>[10016.470008] > > > > > Also, I have no idea how the hell the 'Modules linked in:' line (9th > line) ended up being printed /after/ the > > > module listing began (2nd line).
They do not belong together. The second line is just another call to the same print_modules() done from: arch/x86/kernel/dumpstack_64.c :: show_regs() While we already called print_modules() a few cycles earlier from: kernel/watchdog :: watchdog_timer_fn() > > I tried to force all sorts of racy print_modules() calls, and kept > > your trinity tool from git for hours, it looks all fine here: > > > > Can you easily reproduce the issue you pasted? If, could you give me > > the /dev/kmsg output? > > I've seen it a few times, always with the soft lockup trace. > > You might be able to trigger it using scripts/load-all-modules.sh > from trinity.git. (Assuming you have a lot of modules built, I'm > still trying to track down which one seems to be responsible). Hmm, it does not trigger your pattern. I tried adding an rmmod in that loop, but that crashes after a few seconds. Some modules are just not meant to be removed. :) I forced the watchdog to trigger by setting the timeout to 1s, but all looks still fine: [ 20.854010] BUG: soft lockup - CPU#0 stuck for 1000s! [trinity:247] [ 20.854010] Modules linked in: nfnetlink xfrm_user xfrm_algo pppoe pppox ppp_generic slhc atm bluetooth rfkill microcode cirrus ttm sr_mod cdrom pcspkr drm_kms_helper drm syscopyarea sysfillrect sysimgblt floppy evdev ipv6 [ 20.854010] irq event stamp: 980768 [ 20.854010] hardirqs last enabled at (980767): [<ffffffff814a425d>] __slab_alloc.constprop.65+0x3c9/0x408 [ 20.854010] hardirqs last disabled at (980768): [<ffffffff814adfaa>] apic_timer_interrupt+0x6a/0x80 [ 20.854010] softirqs last enabled at (978462): [<ffffffff8103db1f>] __do_softirq+0x11f/0x170 [ 20.854010] softirqs last disabled at (978457): [<ffffffff814ae82c>] call_softirq+0x1c/0x30 [ 20.854010] CPU 0 [ 20.854010] Modules linked in: nfnetlink xfrm_user xfrm_algo pppoe pppox ppp_generic slhc atm bluetooth rfkill microcode cirrus ttm sr_mod cdrom pcspkr drm_kms_helper drm syscopyarea sysfillrect sysimgblt floppy evdev ipv6 Could it possibly be that we get some sort of corruption somewhere else while running trinity and load modules? Kay -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/