On Tue, Apr 28, 2020 at 10:36 PM Linus Torvalds <[email protected]> wrote: > On Tue, Apr 28, 2020 at 12:08 PM Oleg Nesterov <[email protected]> wrote: > > > > Oops. I can update that old patch but somehow I thought there is a better > > plan which I don't yet understand... > > I don't think any plan survived reality. > > Unless we want to do something *really* hacky.. The attached patch is > not meant to be serious. > > > And, IIRC, Jan had some ideas how to rework the new creds calculation in > > execve paths to avoid the cred_guard_mutex deadlock? > > I'm not sure how you'd do that. > > Execve() fundamentally needs to serialize with PTRACE_ATTACH somehow, > since the whole point is that "tsk->ptrace" changes how the > credentials are interpreted. > > So PTRACE_ATTACH doesn't really _change_ the credentials, but it very > much changes what execve() will do with them. > > But I guess we could do a "if somebody attached to us while we did the > execve(), just repeat the whole thing" > > Jann, what was your clever idea? Maybe it got lost in the long thread..
My clever/horrible/overly-complex idea was basically: In execve: - After the point of no return, but before we start waiting for the other threads to go away, finish calculating our post-execve creds and stash them somewhere in the task_struct or so. - Drop the cred_guard_mutex. - Wait for the other threads to die. - Take the cred_guard_mutex again. - Clear out the pointer in the task_struct. - Finish execve and install the new creds. - Drop the cred_guard_mutex again. Then in ptrace_may_access, after taking the cred_guard_mutex, we'd know that the target task is either outside execve or in the middle of execve, with old and new credentials known; and then we could say "you only get to access that task if you're capable relative to *both* its old and new credentials, since the task currently has both state from the old executable and from the new one". (Other users that expect to use cred_guard_mutex to synchronize with execve would also have to be changed appropriately; e.g. seccomp tsync would have to bail out if the task turns out to be in execve after the mutex has been acquired.) So I think we can conceptually fix the deadlock, but it requires a bit of refactoring. (I have an old branch somewhere in which I tried to implement this, and where I did a bunch of refactoring around ptrace_may_access() so that e.g. the LSM hooks for ptrace can be invoked twice when the target task is in execve, and so that they take the target's cred* as an argument.)

