On Tue, Jun 24, 2014 at 11:35 AM, Oleg Nesterov <o...@redhat.com> wrote: > On 06/24, Kees Cook wrote: >> On Tue, Jun 24, 2014 at 9:52 AM, Oleg Nesterov <o...@redhat.com> wrote: >> >> @@ -1142,6 +1168,7 @@ static struct task_struct *copy_process(unsigned >> >> long clone_flags, >> >> { >> >> int retval; >> >> struct task_struct *p; >> >> + unsigned long irqflags; >> >> >> >> if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == >> >> (CLONE_NEWNS|CLONE_FS)) >> >> return ERR_PTR(-EINVAL); >> >> @@ -1196,7 +1223,6 @@ static struct task_struct *copy_process(unsigned >> >> long clone_flags, >> >> goto fork_out; >> >> >> >> ftrace_graph_init_task(p); >> >> - get_seccomp_filter(p); >> >> >> >> rt_mutex_init_task(p); >> >> >> >> @@ -1434,7 +1460,13 @@ static struct task_struct *copy_process(unsigned >> >> long clone_flags, >> >> p->parent_exec_id = current->self_exec_id; >> >> } >> >> >> >> - spin_lock(¤t->sighand->siglock); >> >> + spin_lock_irqsave(¤t->sighand->siglock, irqflags); >> >> + >> >> + /* >> >> + * Copy seccomp details explicitly here, in case they were changed >> >> + * before holding tasklist_lock. >> >> + */ >> >> + copy_seccomp(p); >> >> >> >> /* >> >> * Process group and session signals need to be delivered to just >> >> the >> >> @@ -1446,7 +1478,7 @@ static struct task_struct *copy_process(unsigned >> >> long clone_flags, >> >> */ >> >> recalc_sigpending(); >> >> if (signal_pending(current)) { >> >> - spin_unlock(¤t->sighand->siglock); >> >> + spin_unlock_irqrestore(¤t->sighand->siglock, >> >> irqflags); >> >> write_unlock_irq(&tasklist_lock); >> >> retval = -ERESTARTNOINTR; >> >> goto bad_fork_free_pid; >> >> @@ -1486,7 +1518,7 @@ static struct task_struct *copy_process(unsigned >> >> long clone_flags, >> >> } >> >> >> >> total_forks++; >> >> - spin_unlock(¤t->sighand->siglock); >> >> + spin_unlock_irqrestore(¤t->sighand->siglock, irqflags); >> >> write_unlock_irq(&tasklist_lock); >> >> proc_fork_connector(p); >> >> cgroup_post_fork(p); >> > >> > It seems that the only change copy_process() needs is copy_seccomp() under >> > the locks. >> > Everythinh else (irqflags games) looks obviously unneeded? >> >> I got irq lock dep warnings without these changes. > > With or without your patches? Could you show the waring?
It seems it's only needed in seccomp itself (I can drop the changes in kernel/fork.c). I get no warnings in that case. If I also remove irq handling from seccomp, I see: [ 17.444328] [ 17.445031] ================================= [ 17.445031] [ INFO: inconsistent lock state ] [ 17.445031] 3.16.0-rc2+ #289 Not tainted [ 17.445031] --------------------------------- [ 17.445031] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. [ 17.445031] seccomp_bpf_tes/1987 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 17.445031] (&(&sighand->siglock)->rlock){?.....}, at: [<ffffffff9e2fb7e5>] do_seccomp.part.7+0x25/0xc0 [ 17.445031] {IN-HARDIRQ-W} state was registered at: [ 17.445031] [<ffffffff9e2a422a>] mark_irqflags+0x19a/0x1b0 [ 17.445031] [<ffffffff9e2a4542>] __lock_acquire+0x302/0x9e0 [ 17.445031] [<ffffffff9e2a5325>] lock_acquire+0x95/0x1e0 [ 17.445031] [<ffffffff9ebda204>] _raw_spin_lock+0x34/0x50 [ 17.445031] [<ffffffff9e263e70>] __lock_task_sighand+0xa0/0x230 [ 17.445031] [<ffffffff9e2652cf>] send_sigqueue+0x3f/0x280 [ 17.445031] [<ffffffff9e2781e3>] posix_timer_event+0x83/0x140 [ 17.445031] [<ffffffff9e2782f2>] posix_timer_fn+0x52/0xd0 [ 17.445031] [<ffffffff9e27d3bc>] __run_hrtimer+0x7c/0x420 [ 17.445031] [<ffffffff9e27de27>] hrtimer_interrupt+0x107/0x260 [ 17.445031] [<ffffffff9e235aa6>] local_apic_timer_interrupt+0x36/0x60 [ 17.445031] [<ffffffff9e235f1e>] smp_apic_timer_interrupt+0x3e/0x60 [ 17.445031] [<ffffffff9ebdbf2f>] apic_timer_interrupt+0x6f/0x80 [ 17.445031] [<ffffffff9e20d99a>] arch_cpu_idle+0xa/0x10 [ 17.445031] [<ffffffff9e29d587>] cpuidle_idle_call+0x157/0x3d0 [ 17.445031] [<ffffffff9e29d945>] cpu_idle_loop+0x145/0x370 [ 17.445031] [<ffffffff9e29dbc6>] cpu_startup_entry+0x56/0x60 [ 17.445031] [<ffffffff9e234544>] start_secondary+0xd4/0xe0 [ 17.445031] irq event stamp: 243 [ 17.445031] hardirqs last enabled at (243): [<ffffffff9e2b9d00>] __call_rcu.constprop.63+0x70/0x120 [ 17.445031] hardirqs last disabled at (242): [<ffffffff9e2b9cd2>] __call_rcu.constprop.63+0x42/0x120 [ 17.445031] softirqs last enabled at (50): [<ffffffff9e256310>] __do_softirq+0x1d0/0x4d0 [ 17.445031] softirqs last disabled at (21): [<ffffffff9e2568be>] irq_exit+0x8e/0xb0 [ 17.445031] [ 17.445031] other info that might help us debug this: [ 17.445031] Possible unsafe locking scenario: [ 17.445031] [ 17.445031] CPU0 [ 17.445031] ---- [ 17.445031] lock(&(&sighand->siglock)->rlock); [ 17.445031] <Interrupt> [ 17.445031] lock(&(&sighand->siglock)->rlock); [ 17.445031] [ 17.445031] *** DEADLOCK *** [ 17.445031] [ 17.445031] no locks held by seccomp_bpf_tes/1987. [ 17.445031] [ 17.445031] stack backtrace: [ 17.445031] CPU: 0 PID: 1987 Comm: seccomp_bpf_tes Not tainted 3.16.0-rc2+ #289 [ 17.445031] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 17.445031] ffffffff9f71dd90 ffff88007878bc48 ffffffff9ebc4fb4 0000000000000007 [ 17.445031] ffff880077fb3570 ffff88007878bca8 ffffffff9ebbad0d 0000000000000000 [ 17.445031] 0000000000000001 00007fff00000001 ffff88007878bd70 ffff88007878bd18 [ 17.445031] Call Trace: [ 17.445031] [<ffffffff9ebc4fb4>] dump_stack+0x4e/0x68 [ 17.445031] [<ffffffff9ebbad0d>] print_usage_bug+0x1f1/0x202 [ 17.445031] [<ffffffff9e2a26e0>] ? check_usage_forwards+0x150/0x150 [ 17.445031] [<ffffffff9ebbad8a>] mark_lock_irq+0x6c/0x137 [ 17.445031] [<ffffffff9e2a3ff5>] mark_lock+0x125/0x1c0 [ 17.445031] [<ffffffff9e2a41c8>] mark_irqflags+0x138/0x1b0 [ 17.445031] [<ffffffff9e2a4542>] __lock_acquire+0x302/0x9e0 [ 17.445031] [<ffffffff9e383a0c>] ? create_object+0x21c/0x2d0 [ 17.445031] [<ffffffff9e2a5325>] lock_acquire+0x95/0x1e0 [ 17.445031] [<ffffffff9e2fb7e5>] ? do_seccomp.part.7+0x25/0xc0 [ 17.445031] [<ffffffff9e2a5e55>] ? trace_hardirqs_on_caller+0x105/0x1d0 [ 17.445031] [<ffffffff9ebda204>] _raw_spin_lock+0x34/0x50 [ 17.445031] [<ffffffff9e2fb7e5>] ? do_seccomp.part.7+0x25/0xc0 [ 17.445031] [<ffffffff9e27ffc5>] ? abort_creds+0x45/0x50 [ 17.445031] [<ffffffff9e2fb7e5>] do_seccomp.part.7+0x25/0xc0 [ 17.445031] [<ffffffff9e2fbf28>] do_seccomp+0x18/0x40 [ 17.445031] [<ffffffff9e2fc1df>] prctl_set_seccomp+0x2f/0x40 [ 17.445031] [<ffffffff9e26bce1>] SyS_prctl+0x141/0x4b0 [ 17.445031] [<ffffffff9e2f306c>] ? __audit_syscall_entry+0x8c/0xe0 [ 17.445031] [<ffffffff9e59556e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 17.445031] [<ffffffff9ebdb012>] system_call_fastpath+0x16/0x1b I'll drop the fork.c changes, and keep the seccomp.c irqflags. Thanks! -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/