On Wed, Jan 27, 2016 at 04:36:02PM -0800, Andy Lutomirski wrote: > On Wed, Jan 27, 2016 at 9:22 AM, Jann Horn <j...@thejh.net> wrote: > > I think it sounds good from a security perspective. > > I'm a bit late to the game, but I have a question: why should this be > keyed to the *root* uid of the namespace in particular? Certainly if > user foo trusts the cap bits on some file, then user foo might trust > those caps to be exerted over any namespace that user foo owns, since > user foo owns the namespace.
... Tying it to a kuid which represents the userns->owner of any namespace in which the capability will be honored might be fine with me. Is that what you mean? So if uid 1000 creates a userns mapping uids 100000-200000, and 100000 in that container puts X=pe on /bin/foo, uid 101000 in that container runs /bin/foo with privilege X. Uid 101000 in someone else's container does not. Although, if I create two containers and provide them different uidmaps, it may well be because I want them segragated and want to minimize the changes of one container breaking out into the other. This risks breaking that. > But another option would be to include a list of uids and gids such > that the cap bits on the file are trusted by any namespace that maps > only uids and gids in the list. After all, the existence of a > namespace with root user foo that also maps bar and baz along with a > file with caps set means that, if baz can get to the file and > permissions are set appropriately, then baz now owns bar (via any > number of fs-related capabilities). So maybe bar and baz should have > to be listed as well. > > But maybe this doesn't matter. > > In any event, at the end of the day, the right answer to all of this > is to stop using setuid and stop using cap bits too and start using > privileged daemons or other things that don't use the eternally > fragile grant-privilege-on-execve mechanisms. Heh, that's why I wrote a p9auth driver a few years ago, but it was too early for such a thing.