Re: vfs: lockdep splat with prepare_bprm_creds

2013-03-17 Thread Oleg Nesterov
On 03/16, Al Viro wrote:
>
> On Fri, Mar 15, 2013 at 07:19:56PM +0100, Oleg Nesterov wrote:
>
> > Cough... I am shy to disclose my ignorance, but could you explain how
> > open_exec()->do_filp_open(MAY_EXEC) can succeed in this case? At least
> > acl_permission_check() looks as if open_exec() should fail...
>
> Umm... point.  It might be a false positive, actually - some other
> seq_file-based sucker (while chmod +x /proc/self/stack will fail,
> chmod +x /proc/vmstat won't) that could be fed to execve(), leading to
>   1) kernel_read() from execve() can grab m.lock for *some* seq_file m,
> while holding ->cred_guard_mutex

Yes, thanks.

I am wondering if lock_trace() is really useful...

Lets ignore proc_pid_syscall() and proc_pid_personality(). Suppose we
change proc_pid_stack()

int proc_pid_stack(...)
{
...

save_stack_trace_tsk(task, &trace);
if (!ptrace_may_access(task, PTRACE_MODE_ATTACH))
goto return -EPERM;

for (i = 0; i < trace.nr_entries; i++)
seq_printf(...);

return 0;
}

Sure, without cred_guard_mutex we can race with install_exec_creds(). But
is it a problem in practice? In any case lock_trace() can't protect against
commit_creds()...

We can even do

task_lock(task);
err = __ptrace_may_access(task, mode);
if (!err)
save_stack_trace_tsk(...);
task_unlock(task);

This way task_lock() protects us against exec_mmap(). And even exec_mmap()
was already called and the task is going to do install_exec_creds() we can't
show the stack of this process "after" exec.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: vfs: lockdep splat with prepare_bprm_creds

2013-03-16 Thread Al Viro
On Fri, Mar 15, 2013 at 07:19:56PM +0100, Oleg Nesterov wrote:
> On 03/15, Al Viro wrote:
> >
> > On Fri, Mar 15, 2013 at 12:07:14AM -0400, Sasha Levin wrote:
> > > Hi all,
> > >
> > > While fuzzing with trinity inside a KVM tools guest running latest -next 
> > > kernel
> > > I've stumbled on the following.
> > >
> > > Dave Jones reported something similar, but that seemed to involve 
> > > cgroup's mutex
> > > and didn't seem like it was the same issue as this one.
> >
> > Lovely...  It's an execve() attempt on a "binary" that is, in fact, a procfs
> > file (/proc//stack),

> Cough... I am shy to disclose my ignorance, but could you explain how
> open_exec()->do_filp_open(MAY_EXEC) can succeed in this case? At least
> acl_permission_check() looks as if open_exec() should fail...

Umm... point.  It might be a false positive, actually - some other
seq_file-based sucker (while chmod +x /proc/self/stack will fail,
chmod +x /proc/vmstat won't) that could be fed to execve(), leading to
1) kernel_read() from execve() can grab m.lock for *some* seq_file m,
while holding ->cred_guard_mutex
2) read() on /proc/self/stack tries to grab ->cred_guard_mutex,
while holding m.lock for a different seq_file m
... with lockdep having no idea that there's a reason why (1) and (2) can't
have the same seq_file involved, said reason being that all files with ->read()
trying to grab ->cred_guard_mutex don't have exec bit set *and* are impossible
to chmod.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: vfs: lockdep splat with prepare_bprm_creds

2013-03-15 Thread Oleg Nesterov
On 03/15, Al Viro wrote:
>
> On Fri, Mar 15, 2013 at 12:07:14AM -0400, Sasha Levin wrote:
> > Hi all,
> >
> > While fuzzing with trinity inside a KVM tools guest running latest -next 
> > kernel
> > I've stumbled on the following.
> >
> > Dave Jones reported something similar, but that seemed to involve cgroup's 
> > mutex
> > and didn't seem like it was the same issue as this one.
>
> Lovely...  It's an execve() attempt on a "binary" that is, in fact, a procfs
> file (/proc//stack),

probably... other lock_trace() callers can't generate this lockdep output
afaics.

> with its ->read() trying to grab ->cred_guard_mutex.
> The fact that it's seq_file-based is irrelevant here - all that matters is
> that we have ->read() for some file trying to grab ->cred_guard_mutex.

Yes, perhaps the patch below makes sense anyway as a cleanup, but obviously
it can't help.

Cough... I am shy to disclose my ignorance, but could you explain how
open_exec()->do_filp_open(MAY_EXEC) can succeed in this case? At least
acl_permission_check() looks as if open_exec() should fail...

Just curious, thanks in advance.

Oleg.

--- x/fs/proc/base.c
+++ x/fs/proc/base.c
@@ -317,12 +317,12 @@ static int proc_pid_stack(struct seq_fil
err = lock_trace(task);
if (!err) {
save_stack_trace_tsk(task, &trace);
+   unlock_trace(task);
 
for (i = 0; i < trace.nr_entries; i++) {
seq_printf(m, "[<%pK>] %pS\n",
   (void *)entries[i], (void *)entries[i]);
}
-   unlock_trace(task);
}
kfree(entries);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: vfs: lockdep splat with prepare_bprm_creds

2013-03-14 Thread Al Viro
On Fri, Mar 15, 2013 at 12:07:14AM -0400, Sasha Levin wrote:
> Hi all,
> 
> While fuzzing with trinity inside a KVM tools guest running latest -next 
> kernel
> I've stumbled on the following.
> 
> Dave Jones reported something similar, but that seemed to involve cgroup's 
> mutex
> and didn't seem like it was the same issue as this one.

Lovely...  It's an execve() attempt on a "binary" that is, in fact, a procfs
file (/proc//stack), with its ->read() trying to grab ->cred_guard_mutex.
The fact that it's seq_file-based is irrelevant here - all that matters is
that we have ->read() for some file trying to grab ->cred_guard_mutex.

It's not *quite* a deadlock, though - all these guys are using
mutex_lock_killable()...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


vfs: lockdep splat with prepare_bprm_creds

2013-03-14 Thread Sasha Levin
Hi all,

While fuzzing with trinity inside a KVM tools guest running latest -next kernel
I've stumbled on the following.

Dave Jones reported something similar, but that seemed to involve cgroup's mutex
and didn't seem like it was the same issue as this one.


[  549.400084] ==
[  549.400084] [ INFO: possible circular locking dependency detected ]
[  549.400084] 3.9.0-rc2-next-20130314-sasha-00046-g3897511 #295 Tainted: G 
   W
[  549.400084] ---
[  549.420729] can: request_module (can-proto-0) failed.
[  549.400084] trinity-child20/12048 is trying to acquire lock:
[  549.400084]  (&p->lock){+.+.+.}, at: [] seq_read+0x3a/0x3d0
[  549.400084]
[  549.400084] but task is already holding lock:
[  549.400084]  (&sig->cred_guard_mutex){+.+.+.}, at: [] 
prepare_bprm_creds+0x31/0x80
[  549.400084]
[  549.400084] which lock already depends on the new lock.
[  549.400084]
[  549.400084]
[  549.400084] the existing dependency chain (in reverse order) is:
[  549.400084]
-> #1 (&sig->cred_guard_mutex){+.+.+.}:
[  549.400084][] check_prevs_add+0xba/0x1a0
[  549.400084][] validate_chain.isra.21+0x6d0/0x800
[  549.400084][] __lock_acquire+0xa23/0xb10
[  549.400084][] lock_acquire+0x1ca/0x270
[  549.400084][] __mutex_lock_common+0x5a/0x5a0
[  549.400084][] mutex_lock_killable_nested+0x3f/0x50
[  549.400084][] lock_trace+0x28/0x70
[  549.400084][] proc_pid_stack+0x65/0xf0
[  549.400084][] proc_single_show+0x5a/0xa0
[  549.400084][] seq_read+0x1af/0x3d0
[  549.400084][] do_loop_readv_writev+0x4b/0x90
[  549.400084][] do_readv_writev+0xf6/0x1d0
[  549.400084][] vfs_readv+0x3e/0x60
[  549.400084][] SyS_readv+0x50/0xd0
[  549.400084][] tracesys+0xe1/0xe6
[  549.400084]
-> #0 (&p->lock){+.+.+.}:
[  549.400084][] check_prev_add+0x145/0x710
[  549.400084][] check_prevs_add+0xba/0x1a0
[  549.400084][] validate_chain.isra.21+0x6d0/0x800
[  549.400084][] __lock_acquire+0xa23/0xb10
[  549.400084][] lock_acquire+0x1ca/0x270
[  549.400084][] __mutex_lock_common+0x5a/0x5a0
[  549.400084][] mutex_lock_nested+0x3f/0x50
[  549.400084][] seq_read+0x3a/0x3d0
[  549.400084][] proc_reg_read+0x201/0x230
[  549.400084][] vfs_read+0xb5/0x180
[  549.400084][] kernel_read+0x41/0x60
[  549.400084][] prepare_binprm+0x18d/0x1b0
[  549.400084][] do_execve_common.isra.21+0x1b6/0x380
[  549.400084][] do_execve+0x13/0x20
[  549.400084][] SyS_execve+0x3e/0x60
[  549.400084][] stub_execve+0x69/0xa0
[  549.400084]
[  549.400084] other info that might help us debug this:
[  549.400084]
[  549.400084]  Possible unsafe locking scenario:
[  549.400084]
[  549.400084]CPU0CPU1
[  549.400084]
[  549.400084]   lock(&sig->cred_guard_mutex);
[  549.400084]lock(&p->lock);
[  549.400084]lock(&sig->cred_guard_mutex);
[  549.400084]   lock(&p->lock);
[  549.400084]
[  549.400084]  *** DEADLOCK ***
[  549.400084]
[  549.400084] 1 lock held by trinity-child20/12048:
[  549.400084]  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [] 
prepare_bprm_creds+0x31/0x80
[  549.400084]
[  549.400084] stack backtrace:
[  549.400084] Pid: 12048, comm: trinity-child20 Tainted: GW
3.9.0-rc2-next-20130314-sasha-00046-g3897511 #295
[  549.400084] Call Trace:
[  549.400084]  [] print_circular_bug+0xd3/0xe4
[  549.400084]  [] check_prev_add+0x145/0x710
[  549.400084]  [] check_prevs_add+0xba/0x1a0
[  549.400084]  [] ? sched_clock+0x15/0x20
[  549.400084]  [] validate_chain.isra.21+0x6d0/0x800
[  549.400084]  [] __lock_acquire+0xa23/0xb10
[  549.400084]  [] ? kvm_clock_read+0x38/0x70
[  549.400084]  [] ? lock_release_holdtime+0x12e/0x140
[  549.400084]  [] ? sched_clock+0x15/0x20
[  549.400084]  [] ? sched_clock_local+0x25/0x90
[  549.400084]  [] ? deactivate_slab+0x7d6/0x820
[  549.400084]  [] lock_acquire+0x1ca/0x270
[  549.400084]  [] ? seq_read+0x3a/0x3d0
[  549.400084]  [] ? sched_clock_local+0x25/0x90
[  549.400084]  [] ? seq_lseek+0x110/0x110
[  549.400084]  [] __mutex_lock_common+0x5a/0x5a0
[  549.400084]  [] ? seq_read+0x3a/0x3d0
[  549.400084]  [] ? __lock_is_held+0x52/0x80
[  549.400084]  [] ? seq_read+0x3a/0x3d0
[  549.400084]  [] ? seq_lseek+0x110/0x110
[  549.400084]  [] mutex_lock_nested+0x3f/0x50
[  549.400084]  [] seq_read+0x3a/0x3d0
[  549.400084]  [] ? delay_tsc+0xdd/0x110
[  549.400084]  [] ? seq_lseek+0x110/0x110
[  549.400084]  [] ? seq_lseek+0x110/0x110
[  549.400084]  [] proc_reg_read+0x201/0x230
[  549.400084]  [] ? proc_reg_write+0x230/0x230
[  549.400084]  [] vfs_read+0xb5/0x180
[  549.400084]  [] kernel_read+0x41/0x60
[  549.400084]  [] prepare_binprm+0x18d/0x1b0
[  549.400084]  [] do