(2014/06/05 0:23), Peter Moody wrote: > > On Wed, Jun 04 2014 at 07:07, Masami Hiramatsu wrote: > >>> Thank you for reporting that. I've tried to reproduce it with your code, but >>> not succeeded yet. Could you share us your kernel config too? >> >> Hmm, it seems that on my environment (Fedora20, gcc version 4.8.2 20131212), >> do_execve() in sys_execve has been optimized out (and do_execve_common() is >> also renamed). I'll try to rebuild it. However, since such optimization >> sometimes >> depends on kernel config, I'd like to do it with your config. >> >> Thank you, > > Sure thing, sorry for not attaching it to begin with. > > One other thing is that, at least on the systems I've been able to repro on, > the more processes, > the more likely I was to not emit a splat before just deadlocking the > machine. eg. on a 12 core > machine, I got the splat with 32 processes and a deadlock with 50. On a 2 > core qemu virtual > machine I got a deadlock with 32 and a splat with something like 12 or 16. > > And FWIW, I'm running ubuntu precise, with gcc version 4.6.3 (Ubuntu/Linaro > 4.6.3-1ubuntu5)
Thank you for sharing the kconfig. I saw the CONFIG_DEBUG_ATOMIC_SLEEP was not set in your kconfig. When I set that and run your test, I had (a lot of) below warnings instead of deadlock. [ 342.072132] BUG: sleeping function called from invalid context at /home/fedora/ksrc/linux-3/kernel/fork.c:615 [ 342.080684] in_atomic(): 1, irqs_disabled(): 1, pid: 5017, name: execve [ 342.080684] INFO: lockdep is turned off. [ 342.080684] irq event stamp: 0 [ 342.080684] hardirqs last enabled at (0): [< (null)>] (null) [ 342.080684] hardirqs last disabled at (0): [<ffffffff81045468>] copy_process.part.31+0x5ba/0x183d [ 342.080684] softirqs last enabled at (0): [<ffffffff81045468>] copy_process.part.31+0x5ba/0x183d [ 342.080684] softirqs last disabled at (0): [< (null)>] (null) [ 342.080684] CPU: 5 PID: 5017 Comm: execve Not tainted 3.15.0-rc8+ #7 [ 342.080684] Hardware name: Red Hat Inc. OpenStack Nova, BIOS 0.5.1 01/01/2007 [ 342.080684] 0000000000000000 ffff8803ff81bdf8 ffffffff81554140 ffff88040a9df500 [ 342.080684] ffff8803ff81be08 ffffffff8106d17c ffff8803ff81be20 ffffffff81044bd8 [ 342.080684] ffffffff8114ad8f ffff8803ff81be30 ffffffffa015802d ffff8803ff81be88 [ 342.080684] Call Trace: [ 342.080684] [<ffffffff81554140>] dump_stack+0x4d/0x66 [ 342.080684] [<ffffffff8106d17c>] __might_sleep+0x118/0x11a [ 342.080684] [<ffffffff81044bd8>] mmput+0x20/0xd9 [ 342.080684] [<ffffffff8114ad8f>] ? SyS_execve+0x2a/0x2e [ 342.080684] [<ffffffffa015802d>] exec_handler+0x2d/0x34 [exec_mm_probe] [ 342.080684] [<ffffffff81032a2c>] trampoline_handler+0x11b/0x1ac [ 342.080684] [<ffffffff8103265a>] kretprobe_trampoline+0x25/0x4c [ 342.080684] [<ffffffff81032635>] ? kretprobe_trampoline_holder+0x9/0x9 [ 342.080684] [<ffffffff8155ca99>] stub_execve+0x69/0xa0 Here, as you can see, calling mmput() in kretprobe handler is actually the root cause of this problem. Thank you, -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/