Re: [PATCH 1/1] simplified security.nscapability xattr
On Mon, May 16, 2016 at 04:15:23PM -0500, Serge E. Hallyn wrote: > Quoting Serge E. Hallyn (se...@hallyn.com): > ... > > There's a problem though. The above suffices to prevent an unprivileged > > user > > in a user_ns from unsharing a user_ns to write a file capability and exploit > > that capability in the ns where he is unprivileged. With one exception, > > which > > is the case where the unprivileged user is mapped to the same kuid which > > created the namespace. So if uid 1000 on the host creates a namespace > > where uid 1000 maps to 1000 in the namespace, then 1000 in the namespace > > can create a new user_ns, write the xattr, and exploit it from the > > parent namespace. This is not an uncommon case. I'm not sure what to do > > about > > it. > > Ok I think I've convinced myself that requiring a kuid 0 in the container > and storing that in the security.nscapability is best solution. The DAC > objection is imo not really valid - we don't have to give uid 0 in the > container any special privilege, we just require that the ns have a uid 0 > mapping. I have not been able to think of any other reliable way to verify > that the writer of the capability is authorized to grant privilege to the > file when executed by current. > > I'm going to proceed with another POC based on the following design: > > 1. no new syscalls at the moment. You can choose to set/query > security.nscapability, but can also just set security.capability from > a user_ns and have the kernel transparently set a security.nscapability > entry for you. > > 2. For now just a single security.nscapability entry, but in a format > that turning it into an array will be a trivial change > > 3. When running file foo which has a security.nscapability for kuid 10, > then any namespace where kuid 10 is root - or which has an ancestor ns > where > that is the case - will run the file with the listed capabilities. > > 4. When doing getxattr of security.capability from a user_ns, if there is a > security.capability entry, that will be returned; else if there is a valid > security.nscapability for your ns, that will be returned. > > 5. when doing a setxattr of security.capability from a user_ns, if there is > a security.nscapability entry, you get EBUSY; else a security.nscapability > with your root kuid will be written provided that (a) you are privileged > over your namespace, (b) you are privileged over your root uid, (c) the > file owner maps into your namespace. Stéphane pointed out this isn't quite right. The EBUSY will happen if a security.nscapability is defined with a kuid over which the writer is not privileged - else it will overwrite. It will also happen if security.capbility is set. > 6. when doing a getxattr of security.nscapability, the entry will be shown > with kuid mapped into your namespace or -1 if the uid does not map into > your ns. > > 7. when doing a setxattr of security.nscapability, if an entry exists, you > get -EBUSY; if you are not privileged over your ns, your root uid, and > the file owner, then you get -EPERM; the xattr includes a uid field, which > must be either 0 or a value valid in your ns. The value will be converted > to a kuid and stored on disk. (Seth, I'm not sure offhand how that should > mesh with your patches, we can talk about it after I send the next patch, > which I'm quite certain will handle it wrongly) > > 8. If a security.capability exists, it will override any security.nscapability > at execve() (so, inverse of my previous two patches). > > -serge
Re: [PATCH 1/1] simplified security.nscapability xattr
On Mon, May 16, 2016 at 04:15:23PM -0500, Serge E. Hallyn wrote: > Quoting Serge E. Hallyn (se...@hallyn.com): > ... > > There's a problem though. The above suffices to prevent an unprivileged > > user > > in a user_ns from unsharing a user_ns to write a file capability and exploit > > that capability in the ns where he is unprivileged. With one exception, > > which > > is the case where the unprivileged user is mapped to the same kuid which > > created the namespace. So if uid 1000 on the host creates a namespace > > where uid 1000 maps to 1000 in the namespace, then 1000 in the namespace > > can create a new user_ns, write the xattr, and exploit it from the > > parent namespace. This is not an uncommon case. I'm not sure what to do > > about > > it. > > Ok I think I've convinced myself that requiring a kuid 0 in the container > and storing that in the security.nscapability is best solution. The DAC > objection is imo not really valid - we don't have to give uid 0 in the > container any special privilege, we just require that the ns have a uid 0 > mapping. I have not been able to think of any other reliable way to verify > that the writer of the capability is authorized to grant privilege to the > file when executed by current. > > I'm going to proceed with another POC based on the following design: > > 1. no new syscalls at the moment. You can choose to set/query > security.nscapability, but can also just set security.capability from > a user_ns and have the kernel transparently set a security.nscapability > entry for you. > > 2. For now just a single security.nscapability entry, but in a format > that turning it into an array will be a trivial change > > 3. When running file foo which has a security.nscapability for kuid 10, > then any namespace where kuid 10 is root - or which has an ancestor ns > where > that is the case - will run the file with the listed capabilities. > > 4. When doing getxattr of security.capability from a user_ns, if there is a > security.capability entry, that will be returned; else if there is a valid > security.nscapability for your ns, that will be returned. > > 5. when doing a setxattr of security.capability from a user_ns, if there is > a security.nscapability entry, you get EBUSY; else a security.nscapability > with your root kuid will be written provided that (a) you are privileged > over your namespace, (b) you are privileged over your root uid, (c) the > file owner maps into your namespace. Stéphane pointed out this isn't quite right. The EBUSY will happen if a security.nscapability is defined with a kuid over which the writer is not privileged - else it will overwrite. It will also happen if security.capbility is set. > 6. when doing a getxattr of security.nscapability, the entry will be shown > with kuid mapped into your namespace or -1 if the uid does not map into > your ns. > > 7. when doing a setxattr of security.nscapability, if an entry exists, you > get -EBUSY; if you are not privileged over your ns, your root uid, and > the file owner, then you get -EPERM; the xattr includes a uid field, which > must be either 0 or a value valid in your ns. The value will be converted > to a kuid and stored on disk. (Seth, I'm not sure offhand how that should > mesh with your patches, we can talk about it after I send the next patch, > which I'm quite certain will handle it wrongly) > > 8. If a security.capability exists, it will override any security.nscapability > at execve() (so, inverse of my previous two patches). > > -serge
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Serge E. Hallyn (se...@hallyn.com): ... > There's a problem though. The above suffices to prevent an unprivileged user > in a user_ns from unsharing a user_ns to write a file capability and exploit > that capability in the ns where he is unprivileged. With one exception, which > is the case where the unprivileged user is mapped to the same kuid which > created the namespace. So if uid 1000 on the host creates a namespace > where uid 1000 maps to 1000 in the namespace, then 1000 in the namespace > can create a new user_ns, write the xattr, and exploit it from the > parent namespace. This is not an uncommon case. I'm not sure what to do > about > it. Ok I think I've convinced myself that requiring a kuid 0 in the container and storing that in the security.nscapability is best solution. The DAC objection is imo not really valid - we don't have to give uid 0 in the container any special privilege, we just require that the ns have a uid 0 mapping. I have not been able to think of any other reliable way to verify that the writer of the capability is authorized to grant privilege to the file when executed by current. I'm going to proceed with another POC based on the following design: 1. no new syscalls at the moment. You can choose to set/query security.nscapability, but can also just set security.capability from a user_ns and have the kernel transparently set a security.nscapability entry for you. 2. For now just a single security.nscapability entry, but in a format that turning it into an array will be a trivial change 3. When running file foo which has a security.nscapability for kuid 10, then any namespace where kuid 10 is root - or which has an ancestor ns where that is the case - will run the file with the listed capabilities. 4. When doing getxattr of security.capability from a user_ns, if there is a security.capability entry, that will be returned; else if there is a valid security.nscapability for your ns, that will be returned. 5. when doing a setxattr of security.capability from a user_ns, if there is a security.nscapability entry, you get EBUSY; else a security.nscapability with your root kuid will be written provided that (a) you are privileged over your namespace, (b) you are privileged over your root uid, (c) the file owner maps into your namespace. 6. when doing a getxattr of security.nscapability, the entry will be shown with kuid mapped into your namespace or -1 if the uid does not map into your ns. 7. when doing a setxattr of security.nscapability, if an entry exists, you get -EBUSY; if you are not privileged over your ns, your root uid, and the file owner, then you get -EPERM; the xattr includes a uid field, which must be either 0 or a value valid in your ns. The value will be converted to a kuid and stored on disk. (Seth, I'm not sure offhand how that should mesh with your patches, we can talk about it after I send the next patch, which I'm quite certain will handle it wrongly) 8. If a security.capability exists, it will override any security.nscapability at execve() (so, inverse of my previous two patches). -serge
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Serge E. Hallyn (se...@hallyn.com): ... > There's a problem though. The above suffices to prevent an unprivileged user > in a user_ns from unsharing a user_ns to write a file capability and exploit > that capability in the ns where he is unprivileged. With one exception, which > is the case where the unprivileged user is mapped to the same kuid which > created the namespace. So if uid 1000 on the host creates a namespace > where uid 1000 maps to 1000 in the namespace, then 1000 in the namespace > can create a new user_ns, write the xattr, and exploit it from the > parent namespace. This is not an uncommon case. I'm not sure what to do > about > it. Ok I think I've convinced myself that requiring a kuid 0 in the container and storing that in the security.nscapability is best solution. The DAC objection is imo not really valid - we don't have to give uid 0 in the container any special privilege, we just require that the ns have a uid 0 mapping. I have not been able to think of any other reliable way to verify that the writer of the capability is authorized to grant privilege to the file when executed by current. I'm going to proceed with another POC based on the following design: 1. no new syscalls at the moment. You can choose to set/query security.nscapability, but can also just set security.capability from a user_ns and have the kernel transparently set a security.nscapability entry for you. 2. For now just a single security.nscapability entry, but in a format that turning it into an array will be a trivial change 3. When running file foo which has a security.nscapability for kuid 10, then any namespace where kuid 10 is root - or which has an ancestor ns where that is the case - will run the file with the listed capabilities. 4. When doing getxattr of security.capability from a user_ns, if there is a security.capability entry, that will be returned; else if there is a valid security.nscapability for your ns, that will be returned. 5. when doing a setxattr of security.capability from a user_ns, if there is a security.nscapability entry, you get EBUSY; else a security.nscapability with your root kuid will be written provided that (a) you are privileged over your namespace, (b) you are privileged over your root uid, (c) the file owner maps into your namespace. 6. when doing a getxattr of security.nscapability, the entry will be shown with kuid mapped into your namespace or -1 if the uid does not map into your ns. 7. when doing a setxattr of security.nscapability, if an entry exists, you get -EBUSY; if you are not privileged over your ns, your root uid, and the file owner, then you get -EPERM; the xattr includes a uid field, which must be either 0 or a value valid in your ns. The value will be converted to a kuid and stored on disk. (Seth, I'm not sure offhand how that should mesh with your patches, we can talk about it after I send the next patch, which I'm quite certain will handle it wrongly) 8. If a security.capability exists, it will override any security.nscapability at execve() (so, inverse of my previous two patches). -serge
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Jann Horn (j...@thejh.net): > On Tue, May 03, 2016 at 12:54:40AM -0500, Eric W. Biederman wrote: > > "Serge E. Hallyn"writes: > > > > > Quoting Andrew G. Morgan (mor...@kernel.org): > > >> > > >> I guess I'm confused how we have strayed so far that this isn't an > > >> obvious > > >> requirement. Uid=0 as being the root of privilege was the basic problem > > >> that capabilities were designed to change. > > > > > > The task executing the file can be any uid mapped into the namespace. The > > > file only has to be owned by the root of the user_ns. Which I agree is > > > unfortunate. We can work around it by putting the root uid into the xattr > > > itself (which still isn't orthogonal but allows the file to at least by > > > owned by non-root), but the problem then is that a task needs to know its > > > global root k_uid just to write the xattr. > > > > The root kuid is just make_kuids(user_ns, 0) so it is easy to find. > > > > It might be a hair better to use the userns->owner instead of the root > > uid. That would allow user namespaces without a mapped root to still > > use file capabilities. > > Please don't do that. A user might want to create multiple containers with > isolated security properties, and in that case, it would be bad if binaries > that are capable in one container are also automatically valid in the user's > other containers. But no, if the namespaces both created by uid 1000 have disjoint uid mappings, say 10-165535 and 20-265536, then a file capability on a file owned by 20 would not be active when the exec()ing task has only 10-165535 mapped. If the uid mappings are not completely disjoint, then you cannot assume that the user wanted the mappings to be disjoint. In particular, while setting up a rootfs for a container I'll frequently use small ad-hoc namespaces to chown files. For instance, my intended final container mapping may be 65536 uids starting at 10, but to chown a file to uid 5 in the container I may create a ns with kuid 1000 as uid 0 and 15 as uid 1. Here the root uid doing the writing is not even mapped into the final container namespace. So a new approach might be to (a) note the kuid which created the namespace in the xattr (magically written by the kernel at xattr write time), then say that for the file capability to take effect, two things must hold: 1. the kuid noted in the xattr must match the kuid which created the calling task's user_ns (or any ancestor creator) 2. the file uid must map into the calling task's namespace To write the filecap, 1. the task must be privileged over the uid which owns the file (in the sense of capable_wrt_inode_uidgid) 2. the task must be privileged over his own user_ns As Eric said, this should address Andrew Morgan's concern about requiring that the file be owned by uid 0 in the namespace. There's a problem though. The above suffices to prevent an unprivileged user in a user_ns from unsharing a user_ns to write a file capability and exploit that capability in the ns where he is unprivileged. With one exception, which is the case where the unprivileged user is mapped to the same kuid which created the namespace. So if uid 1000 on the host creates a namespace where uid 1000 maps to 1000 in the namespace, then 1000 in the namespace can create a new user_ns, write the xattr, and exploit it from the parent namespace. This is not an uncommon case. I'm not sure what to do about it. > Also, this would mean that in an owner!=root scenario, container root can't > setcap executables and needs to ask the administrator of the surrounding > system > to do it. > (Of course, this could be worked around using a dummy userns layer between the > init ns and the container, but I don't like seeing new reasons for such a > hack.)
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Jann Horn (j...@thejh.net): > On Tue, May 03, 2016 at 12:54:40AM -0500, Eric W. Biederman wrote: > > "Serge E. Hallyn" writes: > > > > > Quoting Andrew G. Morgan (mor...@kernel.org): > > >> > > >> I guess I'm confused how we have strayed so far that this isn't an > > >> obvious > > >> requirement. Uid=0 as being the root of privilege was the basic problem > > >> that capabilities were designed to change. > > > > > > The task executing the file can be any uid mapped into the namespace. The > > > file only has to be owned by the root of the user_ns. Which I agree is > > > unfortunate. We can work around it by putting the root uid into the xattr > > > itself (which still isn't orthogonal but allows the file to at least by > > > owned by non-root), but the problem then is that a task needs to know its > > > global root k_uid just to write the xattr. > > > > The root kuid is just make_kuids(user_ns, 0) so it is easy to find. > > > > It might be a hair better to use the userns->owner instead of the root > > uid. That would allow user namespaces without a mapped root to still > > use file capabilities. > > Please don't do that. A user might want to create multiple containers with > isolated security properties, and in that case, it would be bad if binaries > that are capable in one container are also automatically valid in the user's > other containers. But no, if the namespaces both created by uid 1000 have disjoint uid mappings, say 10-165535 and 20-265536, then a file capability on a file owned by 20 would not be active when the exec()ing task has only 10-165535 mapped. If the uid mappings are not completely disjoint, then you cannot assume that the user wanted the mappings to be disjoint. In particular, while setting up a rootfs for a container I'll frequently use small ad-hoc namespaces to chown files. For instance, my intended final container mapping may be 65536 uids starting at 10, but to chown a file to uid 5 in the container I may create a ns with kuid 1000 as uid 0 and 15 as uid 1. Here the root uid doing the writing is not even mapped into the final container namespace. So a new approach might be to (a) note the kuid which created the namespace in the xattr (magically written by the kernel at xattr write time), then say that for the file capability to take effect, two things must hold: 1. the kuid noted in the xattr must match the kuid which created the calling task's user_ns (or any ancestor creator) 2. the file uid must map into the calling task's namespace To write the filecap, 1. the task must be privileged over the uid which owns the file (in the sense of capable_wrt_inode_uidgid) 2. the task must be privileged over his own user_ns As Eric said, this should address Andrew Morgan's concern about requiring that the file be owned by uid 0 in the namespace. There's a problem though. The above suffices to prevent an unprivileged user in a user_ns from unsharing a user_ns to write a file capability and exploit that capability in the ns where he is unprivileged. With one exception, which is the case where the unprivileged user is mapped to the same kuid which created the namespace. So if uid 1000 on the host creates a namespace where uid 1000 maps to 1000 in the namespace, then 1000 in the namespace can create a new user_ns, write the xattr, and exploit it from the parent namespace. This is not an uncommon case. I'm not sure what to do about it. > Also, this would mean that in an owner!=root scenario, container root can't > setcap executables and needs to ask the administrator of the surrounding > system > to do it. > (Of course, this could be worked around using a dummy userns layer between the > init ns and the container, but I don't like seeing new reasons for such a > hack.)
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Serge E. Hallyn (se...@hallyn.com): > Quoting Eric W. Biederman (ebied...@xmission.com): > > "Serge E. Hallyn"writes: > > > > > Quoting Andrew G. Morgan (mor...@kernel.org): > > >> > > >> I guess I'm confused how we have strayed so far that this isn't an > > >> obvious > > >> requirement. Uid=0 as being the root of privilege was the basic problem > > >> that capabilities were designed to change. > > > > > > The task executing the file can be any uid mapped into the namespace. The > > > file only has to be owned by the root of the user_ns. Which I agree is > > > unfortunate. We can work around it by putting the root uid into the xattr > > > itself (which still isn't orthogonal but allows the file to at least by > > > owned by non-root), but the problem then is that a task needs to know its > > > global root k_uid just to write the xattr. > > > > The root kuid is just make_kuids(user_ns, 0) so it is easy to find. > > > > It might be a hair better to use the userns->owner instead of the root > > uid. That would allow user namespaces without a mapped root to still > > use file capabilities. > > That's all fine if the kernel does it for us magically. Which is what we're > talking about below. Above I was talking about userspace putting it into > the xattr. > > > >> Uid is an acl concept. Capabilities are supposed to be independent of > > >> that. > > >> > > >> If we want to support NS file capabilities I would look at replacing the > > >> xattr syscall with a dedicated file capabilities modification syscall. > > >> Then > > > > > > That was one ofthe possibilities I'd mentioned in my earlier proposal, > > > fwiw. The problem is if we want tar to still work unmodified then > > > simple xattr operations still have to work. > > > > > > Maybe there's workable semantics there though. Worth thinking about. > > > > If the problem is compatibilty please look at > > posix_acl_fix_xattr_from_user. With something similar for the > > All right. Excellent. I simply didn't think something like that would > be acceptable. I tend to think of xattrs as just out of band file contents, > but generally under user control. I guess that's not right. > > > security.capability attribute we can perform whatever transformation > > makes sense. I admit adding 4 bytes is a bit of a pain in that context > > but not a big one. > > If we can do all the magic in the kernel behind the scenes, then I > absolutely do not mind adding a new security.capability version with 4 > more bytes. Userspace can just write the old xattr format with the new > version number, kernel fills in the userns owner kuid. It's what I > originally wanted to do, but didn't think was acceptable. > > Sounds great! So I'm still mulling this over and still undecided as to whether we want to 1. leave the xattr as is and use a new pair of syscalls for setting/unsetting filecaps. This would truly let us hide the implementation detail of the file having to be owned by root (apart from returning a perhaps-unexpected EPERM when file isn't owned by uid 0, and documenting that as something that can be changed later) 2. hide the magic in get/setxattr of security.capability. And if we do that, then whether to hide the security.nscapability (or newer-version security.capbility if that's what we do). probably not hide it... -serge
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Serge E. Hallyn (se...@hallyn.com): > Quoting Eric W. Biederman (ebied...@xmission.com): > > "Serge E. Hallyn" writes: > > > > > Quoting Andrew G. Morgan (mor...@kernel.org): > > >> > > >> I guess I'm confused how we have strayed so far that this isn't an > > >> obvious > > >> requirement. Uid=0 as being the root of privilege was the basic problem > > >> that capabilities were designed to change. > > > > > > The task executing the file can be any uid mapped into the namespace. The > > > file only has to be owned by the root of the user_ns. Which I agree is > > > unfortunate. We can work around it by putting the root uid into the xattr > > > itself (which still isn't orthogonal but allows the file to at least by > > > owned by non-root), but the problem then is that a task needs to know its > > > global root k_uid just to write the xattr. > > > > The root kuid is just make_kuids(user_ns, 0) so it is easy to find. > > > > It might be a hair better to use the userns->owner instead of the root > > uid. That would allow user namespaces without a mapped root to still > > use file capabilities. > > That's all fine if the kernel does it for us magically. Which is what we're > talking about below. Above I was talking about userspace putting it into > the xattr. > > > >> Uid is an acl concept. Capabilities are supposed to be independent of > > >> that. > > >> > > >> If we want to support NS file capabilities I would look at replacing the > > >> xattr syscall with a dedicated file capabilities modification syscall. > > >> Then > > > > > > That was one ofthe possibilities I'd mentioned in my earlier proposal, > > > fwiw. The problem is if we want tar to still work unmodified then > > > simple xattr operations still have to work. > > > > > > Maybe there's workable semantics there though. Worth thinking about. > > > > If the problem is compatibilty please look at > > posix_acl_fix_xattr_from_user. With something similar for the > > All right. Excellent. I simply didn't think something like that would > be acceptable. I tend to think of xattrs as just out of band file contents, > but generally under user control. I guess that's not right. > > > security.capability attribute we can perform whatever transformation > > makes sense. I admit adding 4 bytes is a bit of a pain in that context > > but not a big one. > > If we can do all the magic in the kernel behind the scenes, then I > absolutely do not mind adding a new security.capability version with 4 > more bytes. Userspace can just write the old xattr format with the new > version number, kernel fills in the userns owner kuid. It's what I > originally wanted to do, but didn't think was acceptable. > > Sounds great! So I'm still mulling this over and still undecided as to whether we want to 1. leave the xattr as is and use a new pair of syscalls for setting/unsetting filecaps. This would truly let us hide the implementation detail of the file having to be owned by root (apart from returning a perhaps-unexpected EPERM when file isn't owned by uid 0, and documenting that as something that can be changed later) 2. hide the magic in get/setxattr of security.capability. And if we do that, then whether to hide the security.nscapability (or newer-version security.capbility if that's what we do). probably not hide it... -serge
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Eric W. Biederman (ebied...@xmission.com): > "Andrew G. Morgan"writes: > > > On 2 May 2016 6:04 p.m., "Eric W. Biederman" > > wrote: > >> > >> "Serge E. Hallyn" writes: > >> > >> > On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: > >> >> On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn > > wrote: > >> >> > Quoting Kees Cook (keesc...@chromium.org): > >> >> >> On Fri, Apr 22, 2016 at 10:26 AM, > > wrote: > >> >> >> > From: Serge Hallyn > >> > ... > >> >> >> This looks like userspace must knowingly be aware that it is > > in a > >> >> >> namespace and to DTRT instead of it being translated by the > > kernel > >> >> >> when setxattr is called under !init_user_ns? > >> >> > > >> >> > Yes - my libcap2 patch checks /proc/self/uid_map to decide > > that. If that > >> >> > shows you are in init_user_ns then it uses security.capability, > > otherwise > >> >> > it uses security.nscapability. > >> >> > > >> >> > I've occasionally considered having the xattr code do the quiet > >> >> > substitution if need be. > >> >> > > >> >> > In fact, much of this structure comes from when I was still > > trying to > >> >> > do multiple values per xattr. Given what we're doing here, we > > could > >> >> > keep the xattr contents exactly the same, just changing the > > name. > >> >> > So userspace could just get and set security.capability; if you > > are > >> >> > in a non-init user_ns, if security.capability is set then you > > cannot > >> >> > set it; if security.capability is not set, then the kernel > > writes > >> >> > security.nscapability instead and returns success. > >> >> > > >> >> > I don't like magic, but this might be just straightforward > > enough > >> >> > to not be offensive. Thoughts? > >> >> > >> >> Yeah, I think it might be better to have the magic in this case, > > since > >> >> it seems weird to just reject setxattr if a tool didn't realize > > it was > >> >> in a namespace. I'm not sure -- it is also nice to have an > > explicit > >> >> API here. > >> >> > >> >> I would defer to Eric or Michael on that. I keep going back and > > forth, > >> >> though I suspect it's probably best to do what you already have > >> >> (explicit API). > >> > > >> > Michael, Eric, what do you think? The choice we're making here is > >> > whether we should > >> > > >> > 1. Keep a nice simple separate pair of xattrs, the pre-existing > >> > security.capability which can only be written from init_user_ns, > >> > and the new (in this patch) security.nscapability which you can > >> > write to any file where you are privileged wrt the file. > >> > > >> > 2. Make security.capability somewhat 'magic' - if someone in a > >> > non-initial user ns tries to write it and has privilege wrt the > >> > file, then the kernel silently writes security.nscapability > > instead. > >> > > >> > The biggest drawback of (1) would be any tar-like program trying > >> > to restore a file which had security.capability, needing to know > >> > to detect its userns and write the security.nscapability instead. > >> > The drawback of (2) is ~\o/~ magic. > >> > >> Apologies for not having followed this more closely before. > >> > >> I don't like either option. I think we will be in much better shape > > if > >> we upgrade the capability xattr. It seems totally wrong or at least > >> confusing for a file to have both capability xattrs. > >> > >> Just using security.capability allows us to confront any weird > > issues > >> with mixing both the old semantics and the new semantics. > >> > >> We had previously discussioned extending the capbility a little and > >> adding a uid who needed to be the root uid in a user namespace, to > > be > >> valid. Using the owner of the file seems simpler, and even a little > >> more transparent as this makes the security.capability xattr a > > limited > >> form of setuid (which it semantically is). > >> > >> So I believe the new semantics in general are an improvement. > >> > >> > >> Given the expected use case let me ask as simple question: Are there > > any > >> known cases where the owner of a setcap exectuable is not root? > >> > >> I expect the pile of setcap exectuables is small enough we can go > >> through the top distros and look at all of the setcap executlables. > >> > >> > >> If there is not a need to support setcap executables owned by > > non-root, > >> I suspect the right play is to just change the semantics to always > > treat > >> the security.capability attribute this way. > >> > > > > I guess I'm confused how we have strayed so far that this isn't an > > obvious requirement. Uid=0 as being the root of privilege was the > > basic problem that capabilities were designed to change. > > uid==0 as the owner of a file is slightly different from uid==0 of a > running process. Last I checked if it is installed as part of a > distribution the programs are owned by root
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Eric W. Biederman (ebied...@xmission.com): > "Andrew G. Morgan" writes: > > > On 2 May 2016 6:04 p.m., "Eric W. Biederman" > > wrote: > >> > >> "Serge E. Hallyn" writes: > >> > >> > On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: > >> >> On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn > > wrote: > >> >> > Quoting Kees Cook (keesc...@chromium.org): > >> >> >> On Fri, Apr 22, 2016 at 10:26 AM, > > wrote: > >> >> >> > From: Serge Hallyn > >> > ... > >> >> >> This looks like userspace must knowingly be aware that it is > > in a > >> >> >> namespace and to DTRT instead of it being translated by the > > kernel > >> >> >> when setxattr is called under !init_user_ns? > >> >> > > >> >> > Yes - my libcap2 patch checks /proc/self/uid_map to decide > > that. If that > >> >> > shows you are in init_user_ns then it uses security.capability, > > otherwise > >> >> > it uses security.nscapability. > >> >> > > >> >> > I've occasionally considered having the xattr code do the quiet > >> >> > substitution if need be. > >> >> > > >> >> > In fact, much of this structure comes from when I was still > > trying to > >> >> > do multiple values per xattr. Given what we're doing here, we > > could > >> >> > keep the xattr contents exactly the same, just changing the > > name. > >> >> > So userspace could just get and set security.capability; if you > > are > >> >> > in a non-init user_ns, if security.capability is set then you > > cannot > >> >> > set it; if security.capability is not set, then the kernel > > writes > >> >> > security.nscapability instead and returns success. > >> >> > > >> >> > I don't like magic, but this might be just straightforward > > enough > >> >> > to not be offensive. Thoughts? > >> >> > >> >> Yeah, I think it might be better to have the magic in this case, > > since > >> >> it seems weird to just reject setxattr if a tool didn't realize > > it was > >> >> in a namespace. I'm not sure -- it is also nice to have an > > explicit > >> >> API here. > >> >> > >> >> I would defer to Eric or Michael on that. I keep going back and > > forth, > >> >> though I suspect it's probably best to do what you already have > >> >> (explicit API). > >> > > >> > Michael, Eric, what do you think? The choice we're making here is > >> > whether we should > >> > > >> > 1. Keep a nice simple separate pair of xattrs, the pre-existing > >> > security.capability which can only be written from init_user_ns, > >> > and the new (in this patch) security.nscapability which you can > >> > write to any file where you are privileged wrt the file. > >> > > >> > 2. Make security.capability somewhat 'magic' - if someone in a > >> > non-initial user ns tries to write it and has privilege wrt the > >> > file, then the kernel silently writes security.nscapability > > instead. > >> > > >> > The biggest drawback of (1) would be any tar-like program trying > >> > to restore a file which had security.capability, needing to know > >> > to detect its userns and write the security.nscapability instead. > >> > The drawback of (2) is ~\o/~ magic. > >> > >> Apologies for not having followed this more closely before. > >> > >> I don't like either option. I think we will be in much better shape > > if > >> we upgrade the capability xattr. It seems totally wrong or at least > >> confusing for a file to have both capability xattrs. > >> > >> Just using security.capability allows us to confront any weird > > issues > >> with mixing both the old semantics and the new semantics. > >> > >> We had previously discussioned extending the capbility a little and > >> adding a uid who needed to be the root uid in a user namespace, to > > be > >> valid. Using the owner of the file seems simpler, and even a little > >> more transparent as this makes the security.capability xattr a > > limited > >> form of setuid (which it semantically is). > >> > >> So I believe the new semantics in general are an improvement. > >> > >> > >> Given the expected use case let me ask as simple question: Are there > > any > >> known cases where the owner of a setcap exectuable is not root? > >> > >> I expect the pile of setcap exectuables is small enough we can go > >> through the top distros and look at all of the setcap executlables. > >> > >> > >> If there is not a need to support setcap executables owned by > > non-root, > >> I suspect the right play is to just change the semantics to always > > treat > >> the security.capability attribute this way. > >> > > > > I guess I'm confused how we have strayed so far that this isn't an > > obvious requirement. Uid=0 as being the root of privilege was the > > basic problem that capabilities were designed to change. > > uid==0 as the owner of a file is slightly different from uid==0 of a > running process. Last I checked if it is installed as part of a > distribution the programs are owned by root by default. Note that this does mean that a user namespace without a mapping for a uid 0 cannot use file capabilities. But I'm
Re: [PATCH 1/1] simplified security.nscapability xattr
On Tue, May 03, 2016 at 12:54:40AM -0500, Eric W. Biederman wrote: > "Serge E. Hallyn"writes: > > > Quoting Andrew G. Morgan (mor...@kernel.org): > >> > >> I guess I'm confused how we have strayed so far that this isn't an obvious > >> requirement. Uid=0 as being the root of privilege was the basic problem > >> that capabilities were designed to change. > > > > The task executing the file can be any uid mapped into the namespace. The > > file only has to be owned by the root of the user_ns. Which I agree is > > unfortunate. We can work around it by putting the root uid into the xattr > > itself (which still isn't orthogonal but allows the file to at least by > > owned by non-root), but the problem then is that a task needs to know its > > global root k_uid just to write the xattr. > > The root kuid is just make_kuids(user_ns, 0) so it is easy to find. > > It might be a hair better to use the userns->owner instead of the root > uid. That would allow user namespaces without a mapped root to still > use file capabilities. Please don't do that. A user might want to create multiple containers with isolated security properties, and in that case, it would be bad if binaries that are capable in one container are also automatically valid in the user's other containers. Also, this would mean that in an owner!=root scenario, container root can't setcap executables and needs to ask the administrator of the surrounding system to do it. (Of course, this could be worked around using a dummy userns layer between the init ns and the container, but I don't like seeing new reasons for such a hack.) signature.asc Description: Digital signature
Re: [PATCH 1/1] simplified security.nscapability xattr
On Tue, May 03, 2016 at 12:54:40AM -0500, Eric W. Biederman wrote: > "Serge E. Hallyn" writes: > > > Quoting Andrew G. Morgan (mor...@kernel.org): > >> > >> I guess I'm confused how we have strayed so far that this isn't an obvious > >> requirement. Uid=0 as being the root of privilege was the basic problem > >> that capabilities were designed to change. > > > > The task executing the file can be any uid mapped into the namespace. The > > file only has to be owned by the root of the user_ns. Which I agree is > > unfortunate. We can work around it by putting the root uid into the xattr > > itself (which still isn't orthogonal but allows the file to at least by > > owned by non-root), but the problem then is that a task needs to know its > > global root k_uid just to write the xattr. > > The root kuid is just make_kuids(user_ns, 0) so it is easy to find. > > It might be a hair better to use the userns->owner instead of the root > uid. That would allow user namespaces without a mapped root to still > use file capabilities. Please don't do that. A user might want to create multiple containers with isolated security properties, and in that case, it would be bad if binaries that are capable in one container are also automatically valid in the user's other containers. Also, this would mean that in an owner!=root scenario, container root can't setcap executables and needs to ask the administrator of the surrounding system to do it. (Of course, this could be worked around using a dummy userns layer between the init ns and the container, but I don't like seeing new reasons for such a hack.) signature.asc Description: Digital signature
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Eric W. Biederman (ebied...@xmission.com): > "Serge E. Hallyn"writes: > > > Quoting Andrew G. Morgan (mor...@kernel.org): > >> > >> I guess I'm confused how we have strayed so far that this isn't an obvious > >> requirement. Uid=0 as being the root of privilege was the basic problem > >> that capabilities were designed to change. > > > > The task executing the file can be any uid mapped into the namespace. The > > file only has to be owned by the root of the user_ns. Which I agree is > > unfortunate. We can work around it by putting the root uid into the xattr > > itself (which still isn't orthogonal but allows the file to at least by > > owned by non-root), but the problem then is that a task needs to know its > > global root k_uid just to write the xattr. > > The root kuid is just make_kuids(user_ns, 0) so it is easy to find. > > It might be a hair better to use the userns->owner instead of the root > uid. That would allow user namespaces without a mapped root to still > use file capabilities. That's all fine if the kernel does it for us magically. Which is what we're talking about below. Above I was talking about userspace putting it into the xattr. > >> Uid is an acl concept. Capabilities are supposed to be independent of that. > >> > >> If we want to support NS file capabilities I would look at replacing the > >> xattr syscall with a dedicated file capabilities modification syscall. Then > > > > That was one ofthe possibilities I'd mentioned in my earlier proposal, > > fwiw. The problem is if we want tar to still work unmodified then > > simple xattr operations still have to work. > > > > Maybe there's workable semantics there though. Worth thinking about. > > If the problem is compatibilty please look at > posix_acl_fix_xattr_from_user. With something similar for the All right. Excellent. I simply didn't think something like that would be acceptable. I tend to think of xattrs as just out of band file contents, but generally under user control. I guess that's not right. > security.capability attribute we can perform whatever transformation > makes sense. I admit adding 4 bytes is a bit of a pain in that context > but not a big one. If we can do all the magic in the kernel behind the scenes, then I absolutely do not mind adding a new security.capability version with 4 more bytes. Userspace can just write the old xattr format with the new version number, kernel fills in the userns owner kuid. It's what I originally wanted to do, but didn't think was acceptable. Sounds great! -serge
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Eric W. Biederman (ebied...@xmission.com): > "Serge E. Hallyn" writes: > > > Quoting Andrew G. Morgan (mor...@kernel.org): > >> > >> I guess I'm confused how we have strayed so far that this isn't an obvious > >> requirement. Uid=0 as being the root of privilege was the basic problem > >> that capabilities were designed to change. > > > > The task executing the file can be any uid mapped into the namespace. The > > file only has to be owned by the root of the user_ns. Which I agree is > > unfortunate. We can work around it by putting the root uid into the xattr > > itself (which still isn't orthogonal but allows the file to at least by > > owned by non-root), but the problem then is that a task needs to know its > > global root k_uid just to write the xattr. > > The root kuid is just make_kuids(user_ns, 0) so it is easy to find. > > It might be a hair better to use the userns->owner instead of the root > uid. That would allow user namespaces without a mapped root to still > use file capabilities. That's all fine if the kernel does it for us magically. Which is what we're talking about below. Above I was talking about userspace putting it into the xattr. > >> Uid is an acl concept. Capabilities are supposed to be independent of that. > >> > >> If we want to support NS file capabilities I would look at replacing the > >> xattr syscall with a dedicated file capabilities modification syscall. Then > > > > That was one ofthe possibilities I'd mentioned in my earlier proposal, > > fwiw. The problem is if we want tar to still work unmodified then > > simple xattr operations still have to work. > > > > Maybe there's workable semantics there though. Worth thinking about. > > If the problem is compatibilty please look at > posix_acl_fix_xattr_from_user. With something similar for the All right. Excellent. I simply didn't think something like that would be acceptable. I tend to think of xattrs as just out of band file contents, but generally under user control. I guess that's not right. > security.capability attribute we can perform whatever transformation > makes sense. I admit adding 4 bytes is a bit of a pain in that context > but not a big one. If we can do all the magic in the kernel behind the scenes, then I absolutely do not mind adding a new security.capability version with 4 more bytes. Userspace can just write the old xattr format with the new version number, kernel fills in the userns owner kuid. It's what I originally wanted to do, but didn't think was acceptable. Sounds great! -serge
Re: [PATCH 1/1] simplified security.nscapability xattr
"Serge E. Hallyn"writes: > Quoting Andrew G. Morgan (mor...@kernel.org): >> >> I guess I'm confused how we have strayed so far that this isn't an obvious >> requirement. Uid=0 as being the root of privilege was the basic problem >> that capabilities were designed to change. > > The task executing the file can be any uid mapped into the namespace. The > file only has to be owned by the root of the user_ns. Which I agree is > unfortunate. We can work around it by putting the root uid into the xattr > itself (which still isn't orthogonal but allows the file to at least by > owned by non-root), but the problem then is that a task needs to know its > global root k_uid just to write the xattr. The root kuid is just make_kuids(user_ns, 0) so it is easy to find. It might be a hair better to use the userns->owner instead of the root uid. That would allow user namespaces without a mapped root to still use file capabilities. >> Uid is an acl concept. Capabilities are supposed to be independent of that. >> >> If we want to support NS file capabilities I would look at replacing the >> xattr syscall with a dedicated file capabilities modification syscall. Then > > That was one ofthe possibilities I'd mentioned in my earlier proposal, > fwiw. The problem is if we want tar to still work unmodified then > simple xattr operations still have to work. > > Maybe there's workable semantics there though. Worth thinking about. If the problem is compatibilty please look at posix_acl_fix_xattr_from_user. With something similar for the security.capability attribute we can perform whatever transformation makes sense. I admit adding 4 bytes is a bit of a pain in that context but not a big one. Eric
Re: [PATCH 1/1] simplified security.nscapability xattr
"Serge E. Hallyn" writes: > Quoting Andrew G. Morgan (mor...@kernel.org): >> >> I guess I'm confused how we have strayed so far that this isn't an obvious >> requirement. Uid=0 as being the root of privilege was the basic problem >> that capabilities were designed to change. > > The task executing the file can be any uid mapped into the namespace. The > file only has to be owned by the root of the user_ns. Which I agree is > unfortunate. We can work around it by putting the root uid into the xattr > itself (which still isn't orthogonal but allows the file to at least by > owned by non-root), but the problem then is that a task needs to know its > global root k_uid just to write the xattr. The root kuid is just make_kuids(user_ns, 0) so it is easy to find. It might be a hair better to use the userns->owner instead of the root uid. That would allow user namespaces without a mapped root to still use file capabilities. >> Uid is an acl concept. Capabilities are supposed to be independent of that. >> >> If we want to support NS file capabilities I would look at replacing the >> xattr syscall with a dedicated file capabilities modification syscall. Then > > That was one ofthe possibilities I'd mentioned in my earlier proposal, > fwiw. The problem is if we want tar to still work unmodified then > simple xattr operations still have to work. > > Maybe there's workable semantics there though. Worth thinking about. If the problem is compatibilty please look at posix_acl_fix_xattr_from_user. With something similar for the security.capability attribute we can perform whatever transformation makes sense. I admit adding 4 bytes is a bit of a pain in that context but not a big one. Eric
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Andrew G. Morgan (mor...@kernel.org): > On 2 May 2016 6:04 p.m., "Eric W. Biederman"wrote: > > > > "Serge E. Hallyn" writes: > > > > > On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: > > >> On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn > wrote: > > >> > Quoting Kees Cook (keesc...@chromium.org): > > >> >> On Fri, Apr 22, 2016 at 10:26 AM, wrote: > > >> >> > From: Serge Hallyn > > > ... > > >> >> This looks like userspace must knowingly be aware that it is in a > > >> >> namespace and to DTRT instead of it being translated by the kernel > > >> >> when setxattr is called under !init_user_ns? > > >> > > > >> > Yes - my libcap2 patch checks /proc/self/uid_map to decide that. If > that > > >> > shows you are in init_user_ns then it uses security.capability, > otherwise > > >> > it uses security.nscapability. > > >> > > > >> > I've occasionally considered having the xattr code do the quiet > > >> > substitution if need be. > > >> > > > >> > In fact, much of this structure comes from when I was still trying to > > >> > do multiple values per xattr. Given what we're doing here, we could > > >> > keep the xattr contents exactly the same, just changing the name. > > >> > So userspace could just get and set security.capability; if you are > > >> > in a non-init user_ns, if security.capability is set then you cannot > > >> > set it; if security.capability is not set, then the kernel writes > > >> > security.nscapability instead and returns success. > > >> > > > >> > I don't like magic, but this might be just straightforward enough > > >> > to not be offensive. Thoughts? > > >> > > >> Yeah, I think it might be better to have the magic in this case, since > > >> it seems weird to just reject setxattr if a tool didn't realize it was > > >> in a namespace. I'm not sure -- it is also nice to have an explicit > > >> API here. > > >> > > >> I would defer to Eric or Michael on that. I keep going back and forth, > > >> though I suspect it's probably best to do what you already have > > >> (explicit API). > > > > > > Michael, Eric, what do you think? The choice we're making here is > > > whether we should > > > > > > 1. Keep a nice simple separate pair of xattrs, the pre-existing > > > security.capability which can only be written from init_user_ns, > > > and the new (in this patch) security.nscapability which you can > > > write to any file where you are privileged wrt the file. > > > > > > 2. Make security.capability somewhat 'magic' - if someone in a > > > non-initial user ns tries to write it and has privilege wrt the > > > file, then the kernel silently writes security.nscapability instead. > > > > > > The biggest drawback of (1) would be any tar-like program trying > > > to restore a file which had security.capability, needing to know > > > to detect its userns and write the security.nscapability instead. > > > The drawback of (2) is ~\o/~ magic. > > > > Apologies for not having followed this more closely before. > > > > I don't like either option. I think we will be in much better shape if > > we upgrade the capability xattr. It seems totally wrong or at least > > confusing for a file to have both capability xattrs. > > > > Just using security.capability allows us to confront any weird issues > > with mixing both the old semantics and the new semantics. > > > > We had previously discussioned extending the capbility a little and > > adding a uid who needed to be the root uid in a user namespace, to be > > valid. Using the owner of the file seems simpler, and even a little > > more transparent as this makes the security.capability xattr a limited > > form of setuid (which it semantically is). > > > > So I believe the new semantics in general are an improvement. > > > > > > Given the expected use case let me ask as simple question: Are there any > > known cases where the owner of a setcap exectuable is not root? > > > > I expect the pile of setcap exectuables is small enough we can go > > through the top distros and look at all of the setcap executlables. > > > > > > If there is not a need to support setcap executables owned by non-root, > > I suspect the right play is to just change the semantics to always treat > > the security.capability attribute this way. > > > > I guess I'm confused how we have strayed so far that this isn't an obvious > requirement. Uid=0 as being the root of privilege was the basic problem > that capabilities were designed to change. The task executing the file can be any uid mapped into the namespace. The file only has to be owned by the root of the user_ns. Which I agree is unfortunate. We can work around it by putting the root uid into the xattr itself (which still isn't orthogonal but allows the file to at least by owned by non-root), but the problem then is that a task needs to know its global root k_uid just to write the xattr. > Uid is an
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Andrew G. Morgan (mor...@kernel.org): > On 2 May 2016 6:04 p.m., "Eric W. Biederman" wrote: > > > > "Serge E. Hallyn" writes: > > > > > On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: > > >> On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn > wrote: > > >> > Quoting Kees Cook (keesc...@chromium.org): > > >> >> On Fri, Apr 22, 2016 at 10:26 AM, wrote: > > >> >> > From: Serge Hallyn > > > ... > > >> >> This looks like userspace must knowingly be aware that it is in a > > >> >> namespace and to DTRT instead of it being translated by the kernel > > >> >> when setxattr is called under !init_user_ns? > > >> > > > >> > Yes - my libcap2 patch checks /proc/self/uid_map to decide that. If > that > > >> > shows you are in init_user_ns then it uses security.capability, > otherwise > > >> > it uses security.nscapability. > > >> > > > >> > I've occasionally considered having the xattr code do the quiet > > >> > substitution if need be. > > >> > > > >> > In fact, much of this structure comes from when I was still trying to > > >> > do multiple values per xattr. Given what we're doing here, we could > > >> > keep the xattr contents exactly the same, just changing the name. > > >> > So userspace could just get and set security.capability; if you are > > >> > in a non-init user_ns, if security.capability is set then you cannot > > >> > set it; if security.capability is not set, then the kernel writes > > >> > security.nscapability instead and returns success. > > >> > > > >> > I don't like magic, but this might be just straightforward enough > > >> > to not be offensive. Thoughts? > > >> > > >> Yeah, I think it might be better to have the magic in this case, since > > >> it seems weird to just reject setxattr if a tool didn't realize it was > > >> in a namespace. I'm not sure -- it is also nice to have an explicit > > >> API here. > > >> > > >> I would defer to Eric or Michael on that. I keep going back and forth, > > >> though I suspect it's probably best to do what you already have > > >> (explicit API). > > > > > > Michael, Eric, what do you think? The choice we're making here is > > > whether we should > > > > > > 1. Keep a nice simple separate pair of xattrs, the pre-existing > > > security.capability which can only be written from init_user_ns, > > > and the new (in this patch) security.nscapability which you can > > > write to any file where you are privileged wrt the file. > > > > > > 2. Make security.capability somewhat 'magic' - if someone in a > > > non-initial user ns tries to write it and has privilege wrt the > > > file, then the kernel silently writes security.nscapability instead. > > > > > > The biggest drawback of (1) would be any tar-like program trying > > > to restore a file which had security.capability, needing to know > > > to detect its userns and write the security.nscapability instead. > > > The drawback of (2) is ~\o/~ magic. > > > > Apologies for not having followed this more closely before. > > > > I don't like either option. I think we will be in much better shape if > > we upgrade the capability xattr. It seems totally wrong or at least > > confusing for a file to have both capability xattrs. > > > > Just using security.capability allows us to confront any weird issues > > with mixing both the old semantics and the new semantics. > > > > We had previously discussioned extending the capbility a little and > > adding a uid who needed to be the root uid in a user namespace, to be > > valid. Using the owner of the file seems simpler, and even a little > > more transparent as this makes the security.capability xattr a limited > > form of setuid (which it semantically is). > > > > So I believe the new semantics in general are an improvement. > > > > > > Given the expected use case let me ask as simple question: Are there any > > known cases where the owner of a setcap exectuable is not root? > > > > I expect the pile of setcap exectuables is small enough we can go > > through the top distros and look at all of the setcap executlables. > > > > > > If there is not a need to support setcap executables owned by non-root, > > I suspect the right play is to just change the semantics to always treat > > the security.capability attribute this way. > > > > I guess I'm confused how we have strayed so far that this isn't an obvious > requirement. Uid=0 as being the root of privilege was the basic problem > that capabilities were designed to change. The task executing the file can be any uid mapped into the namespace. The file only has to be owned by the root of the user_ns. Which I agree is unfortunate. We can work around it by putting the root uid into the xattr itself (which still isn't orthogonal but allows the file to at least by owned by non-root), but the problem then is that a task needs to know its global root k_uid just to write the xattr. > Uid is an acl concept. Capabilities are supposed to be independent of that. > > If we want to support NS file
Re: [PATCH 1/1] simplified security.nscapability xattr
"Andrew G. Morgan"writes: > On 2 May 2016 6:04 p.m., "Eric W. Biederman" > wrote: >> >> "Serge E. Hallyn" writes: >> >> > On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: >> >> On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn > wrote: >> >> > Quoting Kees Cook (keesc...@chromium.org): >> >> >> On Fri, Apr 22, 2016 at 10:26 AM, > wrote: >> >> >> > From: Serge Hallyn >> > ... >> >> >> This looks like userspace must knowingly be aware that it is > in a >> >> >> namespace and to DTRT instead of it being translated by the > kernel >> >> >> when setxattr is called under !init_user_ns? >> >> > >> >> > Yes - my libcap2 patch checks /proc/self/uid_map to decide > that. If that >> >> > shows you are in init_user_ns then it uses security.capability, > otherwise >> >> > it uses security.nscapability. >> >> > >> >> > I've occasionally considered having the xattr code do the quiet >> >> > substitution if need be. >> >> > >> >> > In fact, much of this structure comes from when I was still > trying to >> >> > do multiple values per xattr. Given what we're doing here, we > could >> >> > keep the xattr contents exactly the same, just changing the > name. >> >> > So userspace could just get and set security.capability; if you > are >> >> > in a non-init user_ns, if security.capability is set then you > cannot >> >> > set it; if security.capability is not set, then the kernel > writes >> >> > security.nscapability instead and returns success. >> >> > >> >> > I don't like magic, but this might be just straightforward > enough >> >> > to not be offensive. Thoughts? >> >> >> >> Yeah, I think it might be better to have the magic in this case, > since >> >> it seems weird to just reject setxattr if a tool didn't realize > it was >> >> in a namespace. I'm not sure -- it is also nice to have an > explicit >> >> API here. >> >> >> >> I would defer to Eric or Michael on that. I keep going back and > forth, >> >> though I suspect it's probably best to do what you already have >> >> (explicit API). >> > >> > Michael, Eric, what do you think? The choice we're making here is >> > whether we should >> > >> > 1. Keep a nice simple separate pair of xattrs, the pre-existing >> > security.capability which can only be written from init_user_ns, >> > and the new (in this patch) security.nscapability which you can >> > write to any file where you are privileged wrt the file. >> > >> > 2. Make security.capability somewhat 'magic' - if someone in a >> > non-initial user ns tries to write it and has privilege wrt the >> > file, then the kernel silently writes security.nscapability > instead. >> > >> > The biggest drawback of (1) would be any tar-like program trying >> > to restore a file which had security.capability, needing to know >> > to detect its userns and write the security.nscapability instead. >> > The drawback of (2) is ~\o/~ magic. >> >> Apologies for not having followed this more closely before. >> >> I don't like either option. I think we will be in much better shape > if >> we upgrade the capability xattr. It seems totally wrong or at least >> confusing for a file to have both capability xattrs. >> >> Just using security.capability allows us to confront any weird > issues >> with mixing both the old semantics and the new semantics. >> >> We had previously discussioned extending the capbility a little and >> adding a uid who needed to be the root uid in a user namespace, to > be >> valid. Using the owner of the file seems simpler, and even a little >> more transparent as this makes the security.capability xattr a > limited >> form of setuid (which it semantically is). >> >> So I believe the new semantics in general are an improvement. >> >> >> Given the expected use case let me ask as simple question: Are there > any >> known cases where the owner of a setcap exectuable is not root? >> >> I expect the pile of setcap exectuables is small enough we can go >> through the top distros and look at all of the setcap executlables. >> >> >> If there is not a need to support setcap executables owned by > non-root, >> I suspect the right play is to just change the semantics to always > treat >> the security.capability attribute this way. >> > > I guess I'm confused how we have strayed so far that this isn't an > obvious requirement. Uid=0 as being the root of privilege was the > basic problem that capabilities were designed to change. uid==0 as the owner of a file is slightly different from uid==0 of a running process. Last I checked if it is installed as part of a distribution the programs are owned by root by default. > Uid is an acl concept. Capabilities are supposed to be independent of > that. I don't have a clue what you mean. Posix capabilities on executables are part of discretionary access control. Whatever their rules posix capabilities are just watered down versions of the permissions
Re: [PATCH 1/1] simplified security.nscapability xattr
"Andrew G. Morgan" writes: > On 2 May 2016 6:04 p.m., "Eric W. Biederman" > wrote: >> >> "Serge E. Hallyn" writes: >> >> > On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: >> >> On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn > wrote: >> >> > Quoting Kees Cook (keesc...@chromium.org): >> >> >> On Fri, Apr 22, 2016 at 10:26 AM, > wrote: >> >> >> > From: Serge Hallyn >> > ... >> >> >> This looks like userspace must knowingly be aware that it is > in a >> >> >> namespace and to DTRT instead of it being translated by the > kernel >> >> >> when setxattr is called under !init_user_ns? >> >> > >> >> > Yes - my libcap2 patch checks /proc/self/uid_map to decide > that. If that >> >> > shows you are in init_user_ns then it uses security.capability, > otherwise >> >> > it uses security.nscapability. >> >> > >> >> > I've occasionally considered having the xattr code do the quiet >> >> > substitution if need be. >> >> > >> >> > In fact, much of this structure comes from when I was still > trying to >> >> > do multiple values per xattr. Given what we're doing here, we > could >> >> > keep the xattr contents exactly the same, just changing the > name. >> >> > So userspace could just get and set security.capability; if you > are >> >> > in a non-init user_ns, if security.capability is set then you > cannot >> >> > set it; if security.capability is not set, then the kernel > writes >> >> > security.nscapability instead and returns success. >> >> > >> >> > I don't like magic, but this might be just straightforward > enough >> >> > to not be offensive. Thoughts? >> >> >> >> Yeah, I think it might be better to have the magic in this case, > since >> >> it seems weird to just reject setxattr if a tool didn't realize > it was >> >> in a namespace. I'm not sure -- it is also nice to have an > explicit >> >> API here. >> >> >> >> I would defer to Eric or Michael on that. I keep going back and > forth, >> >> though I suspect it's probably best to do what you already have >> >> (explicit API). >> > >> > Michael, Eric, what do you think? The choice we're making here is >> > whether we should >> > >> > 1. Keep a nice simple separate pair of xattrs, the pre-existing >> > security.capability which can only be written from init_user_ns, >> > and the new (in this patch) security.nscapability which you can >> > write to any file where you are privileged wrt the file. >> > >> > 2. Make security.capability somewhat 'magic' - if someone in a >> > non-initial user ns tries to write it and has privilege wrt the >> > file, then the kernel silently writes security.nscapability > instead. >> > >> > The biggest drawback of (1) would be any tar-like program trying >> > to restore a file which had security.capability, needing to know >> > to detect its userns and write the security.nscapability instead. >> > The drawback of (2) is ~\o/~ magic. >> >> Apologies for not having followed this more closely before. >> >> I don't like either option. I think we will be in much better shape > if >> we upgrade the capability xattr. It seems totally wrong or at least >> confusing for a file to have both capability xattrs. >> >> Just using security.capability allows us to confront any weird > issues >> with mixing both the old semantics and the new semantics. >> >> We had previously discussioned extending the capbility a little and >> adding a uid who needed to be the root uid in a user namespace, to > be >> valid. Using the owner of the file seems simpler, and even a little >> more transparent as this makes the security.capability xattr a > limited >> form of setuid (which it semantically is). >> >> So I believe the new semantics in general are an improvement. >> >> >> Given the expected use case let me ask as simple question: Are there > any >> known cases where the owner of a setcap exectuable is not root? >> >> I expect the pile of setcap exectuables is small enough we can go >> through the top distros and look at all of the setcap executlables. >> >> >> If there is not a need to support setcap executables owned by > non-root, >> I suspect the right play is to just change the semantics to always > treat >> the security.capability attribute this way. >> > > I guess I'm confused how we have strayed so far that this isn't an > obvious requirement. Uid=0 as being the root of privilege was the > basic problem that capabilities were designed to change. uid==0 as the owner of a file is slightly different from uid==0 of a running process. Last I checked if it is installed as part of a distribution the programs are owned by root by default. > Uid is an acl concept. Capabilities are supposed to be independent of > that. I don't have a clue what you mean. Posix capabilities on executables are part of discretionary access control. Whatever their rules posix capabilities are just watered down versions of the permissions of a setuid root exectuable. I don't think anyone has ever actually run a system with setuid root exectuables not being
Re: [PATCH 1/1] simplified security.nscapability xattr
"Serge E. Hallyn"writes: > On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: >> On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn wrote: >> > Quoting Kees Cook (keesc...@chromium.org): >> >> On Fri, Apr 22, 2016 at 10:26 AM, wrote: >> >> > From: Serge Hallyn > ... >> >> This looks like userspace must knowingly be aware that it is in a >> >> namespace and to DTRT instead of it being translated by the kernel >> >> when setxattr is called under !init_user_ns? >> > >> > Yes - my libcap2 patch checks /proc/self/uid_map to decide that. If that >> > shows you are in init_user_ns then it uses security.capability, otherwise >> > it uses security.nscapability. >> > >> > I've occasionally considered having the xattr code do the quiet >> > substitution if need be. >> > >> > In fact, much of this structure comes from when I was still trying to >> > do multiple values per xattr. Given what we're doing here, we could >> > keep the xattr contents exactly the same, just changing the name. >> > So userspace could just get and set security.capability; if you are >> > in a non-init user_ns, if security.capability is set then you cannot >> > set it; if security.capability is not set, then the kernel writes >> > security.nscapability instead and returns success. >> > >> > I don't like magic, but this might be just straightforward enough >> > to not be offensive. Thoughts? >> >> Yeah, I think it might be better to have the magic in this case, since >> it seems weird to just reject setxattr if a tool didn't realize it was >> in a namespace. I'm not sure -- it is also nice to have an explicit >> API here. >> >> I would defer to Eric or Michael on that. I keep going back and forth, >> though I suspect it's probably best to do what you already have >> (explicit API). > > Michael, Eric, what do you think? The choice we're making here is > whether we should > > 1. Keep a nice simple separate pair of xattrs, the pre-existing > security.capability which can only be written from init_user_ns, > and the new (in this patch) security.nscapability which you can > write to any file where you are privileged wrt the file. > > 2. Make security.capability somewhat 'magic' - if someone in a > non-initial user ns tries to write it and has privilege wrt the > file, then the kernel silently writes security.nscapability instead. > > The biggest drawback of (1) would be any tar-like program trying > to restore a file which had security.capability, needing to know > to detect its userns and write the security.nscapability instead. > The drawback of (2) is ~\o/~ magic. Apologies for not having followed this more closely before. I don't like either option. I think we will be in much better shape if we upgrade the capability xattr. It seems totally wrong or at least confusing for a file to have both capability xattrs. Just using security.capability allows us to confront any weird issues with mixing both the old semantics and the new semantics. We had previously discussioned extending the capbility a little and adding a uid who needed to be the root uid in a user namespace, to be valid. Using the owner of the file seems simpler, and even a little more transparent as this makes the security.capability xattr a limited form of setuid (which it semantically is). So I believe the new semantics in general are an improvement. Given the expected use case let me ask as simple question: Are there any known cases where the owner of a setcap exectuable is not root? I expect the pile of setcap exectuables is small enough we can go through the top distros and look at all of the setcap executlables. If there is not a need to support setcap executables owned by non-root, I suspect the right play is to just change the semantics to always treat the security.capability attribute this way. If there is a need to support setcap exectualbes owned by non-root, then the current implementation is most likely unacceptable. As that problem case can not work in a container. My guess is that we can just reinterpret the current security.capable to only be valid when root owns the file in the initial user namespace. At which point backwards compatibility becomes trivial as the security.capable does not change, just the rules for setting it, and interpreting it. We should also ensure that the gid of the file is mapped into the user namespace where the uid is the root of the user namespace. So that we are effectively testing capable_wrtuid_and_gid on execute as well a read/write of the the xattr. Eric
Re: [PATCH 1/1] simplified security.nscapability xattr
"Serge E. Hallyn" writes: > On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: >> On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn wrote: >> > Quoting Kees Cook (keesc...@chromium.org): >> >> On Fri, Apr 22, 2016 at 10:26 AM, wrote: >> >> > From: Serge Hallyn > ... >> >> This looks like userspace must knowingly be aware that it is in a >> >> namespace and to DTRT instead of it being translated by the kernel >> >> when setxattr is called under !init_user_ns? >> > >> > Yes - my libcap2 patch checks /proc/self/uid_map to decide that. If that >> > shows you are in init_user_ns then it uses security.capability, otherwise >> > it uses security.nscapability. >> > >> > I've occasionally considered having the xattr code do the quiet >> > substitution if need be. >> > >> > In fact, much of this structure comes from when I was still trying to >> > do multiple values per xattr. Given what we're doing here, we could >> > keep the xattr contents exactly the same, just changing the name. >> > So userspace could just get and set security.capability; if you are >> > in a non-init user_ns, if security.capability is set then you cannot >> > set it; if security.capability is not set, then the kernel writes >> > security.nscapability instead and returns success. >> > >> > I don't like magic, but this might be just straightforward enough >> > to not be offensive. Thoughts? >> >> Yeah, I think it might be better to have the magic in this case, since >> it seems weird to just reject setxattr if a tool didn't realize it was >> in a namespace. I'm not sure -- it is also nice to have an explicit >> API here. >> >> I would defer to Eric or Michael on that. I keep going back and forth, >> though I suspect it's probably best to do what you already have >> (explicit API). > > Michael, Eric, what do you think? The choice we're making here is > whether we should > > 1. Keep a nice simple separate pair of xattrs, the pre-existing > security.capability which can only be written from init_user_ns, > and the new (in this patch) security.nscapability which you can > write to any file where you are privileged wrt the file. > > 2. Make security.capability somewhat 'magic' - if someone in a > non-initial user ns tries to write it and has privilege wrt the > file, then the kernel silently writes security.nscapability instead. > > The biggest drawback of (1) would be any tar-like program trying > to restore a file which had security.capability, needing to know > to detect its userns and write the security.nscapability instead. > The drawback of (2) is ~\o/~ magic. Apologies for not having followed this more closely before. I don't like either option. I think we will be in much better shape if we upgrade the capability xattr. It seems totally wrong or at least confusing for a file to have both capability xattrs. Just using security.capability allows us to confront any weird issues with mixing both the old semantics and the new semantics. We had previously discussioned extending the capbility a little and adding a uid who needed to be the root uid in a user namespace, to be valid. Using the owner of the file seems simpler, and even a little more transparent as this makes the security.capability xattr a limited form of setuid (which it semantically is). So I believe the new semantics in general are an improvement. Given the expected use case let me ask as simple question: Are there any known cases where the owner of a setcap exectuable is not root? I expect the pile of setcap exectuables is small enough we can go through the top distros and look at all of the setcap executlables. If there is not a need to support setcap executables owned by non-root, I suspect the right play is to just change the semantics to always treat the security.capability attribute this way. If there is a need to support setcap exectualbes owned by non-root, then the current implementation is most likely unacceptable. As that problem case can not work in a container. My guess is that we can just reinterpret the current security.capable to only be valid when root owns the file in the initial user namespace. At which point backwards compatibility becomes trivial as the security.capable does not change, just the rules for setting it, and interpreting it. We should also ensure that the gid of the file is mapped into the user namespace where the uid is the root of the user namespace. So that we are effectively testing capable_wrtuid_and_gid on execute as well a read/write of the the xattr. Eric
Re: [PATCH 1/1] simplified security.nscapability xattr
On 05/02/2016 05:54 AM, Serge E. Hallyn wrote: > On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: >> On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallynwrote: >>> Quoting Kees Cook (keesc...@chromium.org): On Fri, Apr 22, 2016 at 10:26 AM, wrote: > From: Serge Hallyn > ... This looks like userspace must knowingly be aware that it is in a namespace and to DTRT instead of it being translated by the kernel when setxattr is called under !init_user_ns? >>> >>> Yes - my libcap2 patch checks /proc/self/uid_map to decide that. If that >>> shows you are in init_user_ns then it uses security.capability, otherwise >>> it uses security.nscapability. >>> >>> I've occasionally considered having the xattr code do the quiet >>> substitution if need be. >>> >>> In fact, much of this structure comes from when I was still trying to >>> do multiple values per xattr. Given what we're doing here, we could >>> keep the xattr contents exactly the same, just changing the name. >>> So userspace could just get and set security.capability; if you are >>> in a non-init user_ns, if security.capability is set then you cannot >>> set it; if security.capability is not set, then the kernel writes >>> security.nscapability instead and returns success. >>> >>> I don't like magic, but this might be just straightforward enough >>> to not be offensive. Thoughts? >> >> Yeah, I think it might be better to have the magic in this case, since >> it seems weird to just reject setxattr if a tool didn't realize it was >> in a namespace. I'm not sure -- it is also nice to have an explicit >> API here. >> >> I would defer to Eric or Michael on that. I keep going back and forth, >> though I suspect it's probably best to do what you already have >> (explicit API). > > Michael, Eric, what do you think? The choice we're making here is > whether we should > > 1. Keep a nice simple separate pair of xattrs, the pre-existing > security.capability which can only be written from init_user_ns, > and the new (in this patch) security.nscapability which you can > write to any file where you are privileged wrt the file. > > 2. Make security.capability somewhat 'magic' - if someone in a > non-initial user ns tries to write it and has privilege wrt the > file, then the kernel silently writes security.nscapability instead. > > The biggest drawback of (1) would be any tar-like program trying > to restore a file which had security.capability, needing to know > to detect its userns and write the security.nscapability instead. > The drawback of (2) is ~\o/~ magic. I have only (minor) thoughts from the interface perspective. (1) Sounds the source of possibly unpleasant surprises. (2) Is a little surprising, but less so if it's well documented, and it saves us the surprises of (1). So, (2) sounds better. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Re: [PATCH 1/1] simplified security.nscapability xattr
On 05/02/2016 05:54 AM, Serge E. Hallyn wrote: > On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: >> On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn wrote: >>> Quoting Kees Cook (keesc...@chromium.org): On Fri, Apr 22, 2016 at 10:26 AM, wrote: > From: Serge Hallyn > ... This looks like userspace must knowingly be aware that it is in a namespace and to DTRT instead of it being translated by the kernel when setxattr is called under !init_user_ns? >>> >>> Yes - my libcap2 patch checks /proc/self/uid_map to decide that. If that >>> shows you are in init_user_ns then it uses security.capability, otherwise >>> it uses security.nscapability. >>> >>> I've occasionally considered having the xattr code do the quiet >>> substitution if need be. >>> >>> In fact, much of this structure comes from when I was still trying to >>> do multiple values per xattr. Given what we're doing here, we could >>> keep the xattr contents exactly the same, just changing the name. >>> So userspace could just get and set security.capability; if you are >>> in a non-init user_ns, if security.capability is set then you cannot >>> set it; if security.capability is not set, then the kernel writes >>> security.nscapability instead and returns success. >>> >>> I don't like magic, but this might be just straightforward enough >>> to not be offensive. Thoughts? >> >> Yeah, I think it might be better to have the magic in this case, since >> it seems weird to just reject setxattr if a tool didn't realize it was >> in a namespace. I'm not sure -- it is also nice to have an explicit >> API here. >> >> I would defer to Eric or Michael on that. I keep going back and forth, >> though I suspect it's probably best to do what you already have >> (explicit API). > > Michael, Eric, what do you think? The choice we're making here is > whether we should > > 1. Keep a nice simple separate pair of xattrs, the pre-existing > security.capability which can only be written from init_user_ns, > and the new (in this patch) security.nscapability which you can > write to any file where you are privileged wrt the file. > > 2. Make security.capability somewhat 'magic' - if someone in a > non-initial user ns tries to write it and has privilege wrt the > file, then the kernel silently writes security.nscapability instead. > > The biggest drawback of (1) would be any tar-like program trying > to restore a file which had security.capability, needing to know > to detect its userns and write the security.nscapability instead. > The drawback of (2) is ~\o/~ magic. I have only (minor) thoughts from the interface perspective. (1) Sounds the source of possibly unpleasant surprises. (2) Is a little surprising, but less so if it's well documented, and it saves us the surprises of (1). So, (2) sounds better. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
Re: [PATCH 1/1] simplified security.nscapability xattr
On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: > On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallynwrote: > > Quoting Kees Cook (keesc...@chromium.org): > >> On Fri, Apr 22, 2016 at 10:26 AM, wrote: > >> > From: Serge Hallyn ... > >> This looks like userspace must knowingly be aware that it is in a > >> namespace and to DTRT instead of it being translated by the kernel > >> when setxattr is called under !init_user_ns? > > > > Yes - my libcap2 patch checks /proc/self/uid_map to decide that. If that > > shows you are in init_user_ns then it uses security.capability, otherwise > > it uses security.nscapability. > > > > I've occasionally considered having the xattr code do the quiet > > substitution if need be. > > > > In fact, much of this structure comes from when I was still trying to > > do multiple values per xattr. Given what we're doing here, we could > > keep the xattr contents exactly the same, just changing the name. > > So userspace could just get and set security.capability; if you are > > in a non-init user_ns, if security.capability is set then you cannot > > set it; if security.capability is not set, then the kernel writes > > security.nscapability instead and returns success. > > > > I don't like magic, but this might be just straightforward enough > > to not be offensive. Thoughts? > > Yeah, I think it might be better to have the magic in this case, since > it seems weird to just reject setxattr if a tool didn't realize it was > in a namespace. I'm not sure -- it is also nice to have an explicit > API here. > > I would defer to Eric or Michael on that. I keep going back and forth, > though I suspect it's probably best to do what you already have > (explicit API). Michael, Eric, what do you think? The choice we're making here is whether we should 1. Keep a nice simple separate pair of xattrs, the pre-existing security.capability which can only be written from init_user_ns, and the new (in this patch) security.nscapability which you can write to any file where you are privileged wrt the file. 2. Make security.capability somewhat 'magic' - if someone in a non-initial user ns tries to write it and has privilege wrt the file, then the kernel silently writes security.nscapability instead. The biggest drawback of (1) would be any tar-like program trying to restore a file which had security.capability, needing to know to detect its userns and write the security.nscapability instead. The drawback of (2) is ~\o/~ magic. -serge
Re: [PATCH 1/1] simplified security.nscapability xattr
On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: > On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn wrote: > > Quoting Kees Cook (keesc...@chromium.org): > >> On Fri, Apr 22, 2016 at 10:26 AM, wrote: > >> > From: Serge Hallyn ... > >> This looks like userspace must knowingly be aware that it is in a > >> namespace and to DTRT instead of it being translated by the kernel > >> when setxattr is called under !init_user_ns? > > > > Yes - my libcap2 patch checks /proc/self/uid_map to decide that. If that > > shows you are in init_user_ns then it uses security.capability, otherwise > > it uses security.nscapability. > > > > I've occasionally considered having the xattr code do the quiet > > substitution if need be. > > > > In fact, much of this structure comes from when I was still trying to > > do multiple values per xattr. Given what we're doing here, we could > > keep the xattr contents exactly the same, just changing the name. > > So userspace could just get and set security.capability; if you are > > in a non-init user_ns, if security.capability is set then you cannot > > set it; if security.capability is not set, then the kernel writes > > security.nscapability instead and returns success. > > > > I don't like magic, but this might be just straightforward enough > > to not be offensive. Thoughts? > > Yeah, I think it might be better to have the magic in this case, since > it seems weird to just reject setxattr if a tool didn't realize it was > in a namespace. I'm not sure -- it is also nice to have an explicit > API here. > > I would defer to Eric or Michael on that. I keep going back and forth, > though I suspect it's probably best to do what you already have > (explicit API). Michael, Eric, what do you think? The choice we're making here is whether we should 1. Keep a nice simple separate pair of xattrs, the pre-existing security.capability which can only be written from init_user_ns, and the new (in this patch) security.nscapability which you can write to any file where you are privileged wrt the file. 2. Make security.capability somewhat 'magic' - if someone in a non-initial user ns tries to write it and has privilege wrt the file, then the kernel silently writes security.nscapability instead. The biggest drawback of (1) would be any tar-like program trying to restore a file which had security.capability, needing to know to detect its userns and write the security.nscapability instead. The drawback of (2) is ~\o/~ magic. -serge
Re: [PATCH 1/1] simplified security.nscapability xattr
On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: > On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallynwrote: > > Quoting Kees Cook (keesc...@chromium.org): > >> On Fri, Apr 22, 2016 at 10:26 AM, wrote: > >> > From: Serge Hallyn > >> > > >> > This can only be set by root in his own namespace, and will > >> > only be respected by namespaces with that same root kuid > >> > mapped as root, or namespaces descended from it. > >> > > >> > This allows a simple setxattr to work, allows tar/untar to > >> > work, and allows us to tar in one namespace and untar in > >> > another while preserving the capability, without risking > >> > leaking privilege into a parent namespace. > >> > >> The concept seems sensible to me. Various notes below... > >> > >> > > >> > Signed-off-by: Serge Hallyn > >> > --- > >> > include/linux/capability.h |5 ++- > >> > include/uapi/linux/capability.h | 18 > >> > include/uapi/linux/xattr.h |3 ++ > >> > security/commoncap.c| 91 > >> > +-- > >> > 4 files changed, 112 insertions(+), 5 deletions(-) > >> > > >> > diff --git a/include/linux/capability.h b/include/linux/capability.h > >> > index 00690ff..cf533ff 100644 > >> > --- a/include/linux/capability.h > >> > +++ b/include/linux/capability.h > >> > @@ -13,7 +13,7 @@ > >> > #define _LINUX_CAPABILITY_H > >> > > >> > #include > >> > - > >> > +#include > >> > > >> > #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 > >> > #define _KERNEL_CAPABILITY_U32S_LINUX_CAPABILITY_U32S_3 > >> > @@ -31,6 +31,9 @@ struct cpu_vfs_cap_data { > >> > kernel_cap_t inheritable; > >> > }; > >> > > >> > +#define NS_CAPS_VERSION(x) (x & 0xFF) > >> > +#define NS_CAPS_FLAGS(x) ((x >> 8) & 0xFF) > >> > + > >> > #define _USER_CAP_HEADER_SIZE (sizeof(struct __user_cap_header_struct)) > >> > #define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t)) > >> > > >> > diff --git a/include/uapi/linux/capability.h > >> > b/include/uapi/linux/capability.h > >> > index 12c37a1..f0b4a66 100644 > >> > --- a/include/uapi/linux/capability.h > >> > +++ b/include/uapi/linux/capability.h > >> > @@ -62,6 +62,9 @@ typedef struct __user_cap_data_struct { > >> > #define VFS_CAP_U32_2 2 > >> > #define XATTR_CAPS_SZ_2 (sizeof(__le32)*(1 + 2*VFS_CAP_U32_2)) > >> > > >> > +/* version number for security.nscapability xattrs hdr->hdr_info */ > >> > +#define VFS_NS_CAP_REVISION 1 > >> > + > >> > #define XATTR_CAPS_SZ XATTR_CAPS_SZ_2 > >> > #define VFS_CAP_U32 VFS_CAP_U32_2 > >> > #define VFS_CAP_REVISION VFS_CAP_REVISION_2 > >> > @@ -74,6 +77,21 @@ struct vfs_cap_data { > >> > } data[VFS_CAP_U32]; > >> > }; > >> > > >> > +#define VFS_NS_CAP_EFFECTIVE0x1 > >> > +/* > >> > + * 32-bit hdr_info contains > >> > + * 16 leftmost: reserved > >> > + * next 8: flags (only VFS_NS_CAP_EFFECTIVE so far) > >> > + * last 8: version > >> > + */ > >> > +struct vfs_ns_cap_data { > >> > + __le32 magic_etc; > >> > + struct { > >> > + __le32 permitted;/* Little endian */ > >> > + __le32 inheritable; /* Little endian */ > >> > + } data[VFS_CAP_U32]; > >> > +}; > >> > >> This is identical to vfs_cap_data. Is there a reason not to re-use the > >> existing one? > > > > Hm. I used to have them completely different. ATM the only difference > > is what goes into the magic_etc, and that not very (different). So > > yeah it probably should be re-used. > > > >> > + > >> > #ifndef __KERNEL__ > >> > > >> > /* > >> > diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h > >> > index 1590c49..67c80ab 100644 > >> > --- a/include/uapi/linux/xattr.h > >> > +++ b/include/uapi/linux/xattr.h > >> > @@ -68,6 +68,9 @@ > >> > #define XATTR_CAPS_SUFFIX "capability" > >> > #define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX > >> > > >> > +#define XATTR_NS_CAPS_SUFFIX "nscapability" > >> > +#define XATTR_NAME_NS_CAPS XATTR_SECURITY_PREFIX XATTR_NS_CAPS_SUFFIX > >> > + > >> > #define XATTR_POSIX_ACL_ACCESS "posix_acl_access" > >> > #define XATTR_NAME_POSIX_ACL_ACCESS XATTR_SYSTEM_PREFIX > >> > XATTR_POSIX_ACL_ACCESS > >> > #define XATTR_POSIX_ACL_DEFAULT "posix_acl_default" > >> > >> Are these documented anywhere in Documentation/ or in man pages? This > >> seems like it'd need a man-page update too. > > > > Yeah, if we decide we're ok with this strategy I'll update those. > > > >> > diff --git a/security/commoncap.c b/security/commoncap.c > >> > index 48071ed..8f3f34a 100644 > >> > --- a/security/commoncap.c > >> > +++ b/security/commoncap.c > >> > @@ -313,6 +313,10 @@ int cap_inode_need_killpriv(struct dentry *dentry) > >> > if (!inode->i_op->getxattr) > >> >return 0; > >> > > >> > + error = inode->i_op->getxattr(dentry, XATTR_NAME_NS_CAPS,
Re: [PATCH 1/1] simplified security.nscapability xattr
On Tue, Apr 26, 2016 at 03:39:54PM -0700, Kees Cook wrote: > On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn wrote: > > Quoting Kees Cook (keesc...@chromium.org): > >> On Fri, Apr 22, 2016 at 10:26 AM, wrote: > >> > From: Serge Hallyn > >> > > >> > This can only be set by root in his own namespace, and will > >> > only be respected by namespaces with that same root kuid > >> > mapped as root, or namespaces descended from it. > >> > > >> > This allows a simple setxattr to work, allows tar/untar to > >> > work, and allows us to tar in one namespace and untar in > >> > another while preserving the capability, without risking > >> > leaking privilege into a parent namespace. > >> > >> The concept seems sensible to me. Various notes below... > >> > >> > > >> > Signed-off-by: Serge Hallyn > >> > --- > >> > include/linux/capability.h |5 ++- > >> > include/uapi/linux/capability.h | 18 > >> > include/uapi/linux/xattr.h |3 ++ > >> > security/commoncap.c| 91 > >> > +-- > >> > 4 files changed, 112 insertions(+), 5 deletions(-) > >> > > >> > diff --git a/include/linux/capability.h b/include/linux/capability.h > >> > index 00690ff..cf533ff 100644 > >> > --- a/include/linux/capability.h > >> > +++ b/include/linux/capability.h > >> > @@ -13,7 +13,7 @@ > >> > #define _LINUX_CAPABILITY_H > >> > > >> > #include > >> > - > >> > +#include > >> > > >> > #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 > >> > #define _KERNEL_CAPABILITY_U32S_LINUX_CAPABILITY_U32S_3 > >> > @@ -31,6 +31,9 @@ struct cpu_vfs_cap_data { > >> > kernel_cap_t inheritable; > >> > }; > >> > > >> > +#define NS_CAPS_VERSION(x) (x & 0xFF) > >> > +#define NS_CAPS_FLAGS(x) ((x >> 8) & 0xFF) > >> > + > >> > #define _USER_CAP_HEADER_SIZE (sizeof(struct __user_cap_header_struct)) > >> > #define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t)) > >> > > >> > diff --git a/include/uapi/linux/capability.h > >> > b/include/uapi/linux/capability.h > >> > index 12c37a1..f0b4a66 100644 > >> > --- a/include/uapi/linux/capability.h > >> > +++ b/include/uapi/linux/capability.h > >> > @@ -62,6 +62,9 @@ typedef struct __user_cap_data_struct { > >> > #define VFS_CAP_U32_2 2 > >> > #define XATTR_CAPS_SZ_2 (sizeof(__le32)*(1 + 2*VFS_CAP_U32_2)) > >> > > >> > +/* version number for security.nscapability xattrs hdr->hdr_info */ > >> > +#define VFS_NS_CAP_REVISION 1 > >> > + > >> > #define XATTR_CAPS_SZ XATTR_CAPS_SZ_2 > >> > #define VFS_CAP_U32 VFS_CAP_U32_2 > >> > #define VFS_CAP_REVISION VFS_CAP_REVISION_2 > >> > @@ -74,6 +77,21 @@ struct vfs_cap_data { > >> > } data[VFS_CAP_U32]; > >> > }; > >> > > >> > +#define VFS_NS_CAP_EFFECTIVE0x1 > >> > +/* > >> > + * 32-bit hdr_info contains > >> > + * 16 leftmost: reserved > >> > + * next 8: flags (only VFS_NS_CAP_EFFECTIVE so far) > >> > + * last 8: version > >> > + */ > >> > +struct vfs_ns_cap_data { > >> > + __le32 magic_etc; > >> > + struct { > >> > + __le32 permitted;/* Little endian */ > >> > + __le32 inheritable; /* Little endian */ > >> > + } data[VFS_CAP_U32]; > >> > +}; > >> > >> This is identical to vfs_cap_data. Is there a reason not to re-use the > >> existing one? > > > > Hm. I used to have them completely different. ATM the only difference > > is what goes into the magic_etc, and that not very (different). So > > yeah it probably should be re-used. > > > >> > + > >> > #ifndef __KERNEL__ > >> > > >> > /* > >> > diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h > >> > index 1590c49..67c80ab 100644 > >> > --- a/include/uapi/linux/xattr.h > >> > +++ b/include/uapi/linux/xattr.h > >> > @@ -68,6 +68,9 @@ > >> > #define XATTR_CAPS_SUFFIX "capability" > >> > #define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX > >> > > >> > +#define XATTR_NS_CAPS_SUFFIX "nscapability" > >> > +#define XATTR_NAME_NS_CAPS XATTR_SECURITY_PREFIX XATTR_NS_CAPS_SUFFIX > >> > + > >> > #define XATTR_POSIX_ACL_ACCESS "posix_acl_access" > >> > #define XATTR_NAME_POSIX_ACL_ACCESS XATTR_SYSTEM_PREFIX > >> > XATTR_POSIX_ACL_ACCESS > >> > #define XATTR_POSIX_ACL_DEFAULT "posix_acl_default" > >> > >> Are these documented anywhere in Documentation/ or in man pages? This > >> seems like it'd need a man-page update too. > > > > Yeah, if we decide we're ok with this strategy I'll update those. > > > >> > diff --git a/security/commoncap.c b/security/commoncap.c > >> > index 48071ed..8f3f34a 100644 > >> > --- a/security/commoncap.c > >> > +++ b/security/commoncap.c > >> > @@ -313,6 +313,10 @@ int cap_inode_need_killpriv(struct dentry *dentry) > >> > if (!inode->i_op->getxattr) > >> >return 0; > >> > > >> > + error = inode->i_op->getxattr(dentry, XATTR_NAME_NS_CAPS, NULL, > >> > 0); > >> > + if (error > 0) > >> > + return 1; > >> > + > >>
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Kees Cook (keesc...@chromium.org): > On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallynwrote: > > Quoting Kees Cook (keesc...@chromium.org): > >> On Fri, Apr 22, 2016 at 10:26 AM, wrote: > >> > From: Serge Hallyn > >> > > >> > This can only be set by root in his own namespace, and will > >> > only be respected by namespaces with that same root kuid > >> > mapped as root, or namespaces descended from it. > >> > > >> > This allows a simple setxattr to work, allows tar/untar to > >> > work, and allows us to tar in one namespace and untar in > >> > another while preserving the capability, without risking > >> > leaking privilege into a parent namespace. > >> > >> The concept seems sensible to me. Various notes below... > >> > >> > > >> > Signed-off-by: Serge Hallyn > >> > --- > >> > include/linux/capability.h |5 ++- > >> > include/uapi/linux/capability.h | 18 > >> > include/uapi/linux/xattr.h |3 ++ > >> > security/commoncap.c| 91 > >> > +-- > >> > 4 files changed, 112 insertions(+), 5 deletions(-) > >> > > >> > diff --git a/include/linux/capability.h b/include/linux/capability.h > >> > index 00690ff..cf533ff 100644 > >> > --- a/include/linux/capability.h > >> > +++ b/include/linux/capability.h > >> > @@ -13,7 +13,7 @@ > >> > #define _LINUX_CAPABILITY_H > >> > > >> > #include > >> > - > >> > +#include > >> > > >> > #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 > >> > #define _KERNEL_CAPABILITY_U32S_LINUX_CAPABILITY_U32S_3 > >> > @@ -31,6 +31,9 @@ struct cpu_vfs_cap_data { > >> > kernel_cap_t inheritable; > >> > }; > >> > > >> > +#define NS_CAPS_VERSION(x) (x & 0xFF) > >> > +#define NS_CAPS_FLAGS(x) ((x >> 8) & 0xFF) > >> > + > >> > #define _USER_CAP_HEADER_SIZE (sizeof(struct __user_cap_header_struct)) > >> > #define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t)) > >> > > >> > diff --git a/include/uapi/linux/capability.h > >> > b/include/uapi/linux/capability.h > >> > index 12c37a1..f0b4a66 100644 > >> > --- a/include/uapi/linux/capability.h > >> > +++ b/include/uapi/linux/capability.h > >> > @@ -62,6 +62,9 @@ typedef struct __user_cap_data_struct { > >> > #define VFS_CAP_U32_2 2 > >> > #define XATTR_CAPS_SZ_2 (sizeof(__le32)*(1 + 2*VFS_CAP_U32_2)) > >> > > >> > +/* version number for security.nscapability xattrs hdr->hdr_info */ > >> > +#define VFS_NS_CAP_REVISION 1 > >> > + > >> > #define XATTR_CAPS_SZ XATTR_CAPS_SZ_2 > >> > #define VFS_CAP_U32 VFS_CAP_U32_2 > >> > #define VFS_CAP_REVISION VFS_CAP_REVISION_2 > >> > @@ -74,6 +77,21 @@ struct vfs_cap_data { > >> > } data[VFS_CAP_U32]; > >> > }; > >> > > >> > +#define VFS_NS_CAP_EFFECTIVE0x1 > >> > +/* > >> > + * 32-bit hdr_info contains > >> > + * 16 leftmost: reserved > >> > + * next 8: flags (only VFS_NS_CAP_EFFECTIVE so far) > >> > + * last 8: version > >> > + */ > >> > +struct vfs_ns_cap_data { > >> > + __le32 magic_etc; > >> > + struct { > >> > + __le32 permitted;/* Little endian */ > >> > + __le32 inheritable; /* Little endian */ > >> > + } data[VFS_CAP_U32]; > >> > +}; > >> > >> This is identical to vfs_cap_data. Is there a reason not to re-use the > >> existing one? > > > > Hm. I used to have them completely different. ATM the only difference > > is what goes into the magic_etc, and that not very (different). So > > yeah it probably should be re-used. > > > >> > + > >> > #ifndef __KERNEL__ > >> > > >> > /* > >> > diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h > >> > index 1590c49..67c80ab 100644 > >> > --- a/include/uapi/linux/xattr.h > >> > +++ b/include/uapi/linux/xattr.h > >> > @@ -68,6 +68,9 @@ > >> > #define XATTR_CAPS_SUFFIX "capability" > >> > #define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX > >> > > >> > +#define XATTR_NS_CAPS_SUFFIX "nscapability" > >> > +#define XATTR_NAME_NS_CAPS XATTR_SECURITY_PREFIX XATTR_NS_CAPS_SUFFIX > >> > + > >> > #define XATTR_POSIX_ACL_ACCESS "posix_acl_access" > >> > #define XATTR_NAME_POSIX_ACL_ACCESS XATTR_SYSTEM_PREFIX > >> > XATTR_POSIX_ACL_ACCESS > >> > #define XATTR_POSIX_ACL_DEFAULT "posix_acl_default" > >> > >> Are these documented anywhere in Documentation/ or in man pages? This > >> seems like it'd need a man-page update too. > > > > Yeah, if we decide we're ok with this strategy I'll update those. > > > >> > diff --git a/security/commoncap.c b/security/commoncap.c > >> > index 48071ed..8f3f34a 100644 > >> > --- a/security/commoncap.c > >> > +++ b/security/commoncap.c > >> > @@ -313,6 +313,10 @@ int cap_inode_need_killpriv(struct dentry *dentry) > >> > if (!inode->i_op->getxattr) > >> >return 0; > >> > > >> > + error = inode->i_op->getxattr(dentry, XATTR_NAME_NS_CAPS, NULL, > >> > 0);
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Kees Cook (keesc...@chromium.org): > On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn wrote: > > Quoting Kees Cook (keesc...@chromium.org): > >> On Fri, Apr 22, 2016 at 10:26 AM, wrote: > >> > From: Serge Hallyn > >> > > >> > This can only be set by root in his own namespace, and will > >> > only be respected by namespaces with that same root kuid > >> > mapped as root, or namespaces descended from it. > >> > > >> > This allows a simple setxattr to work, allows tar/untar to > >> > work, and allows us to tar in one namespace and untar in > >> > another while preserving the capability, without risking > >> > leaking privilege into a parent namespace. > >> > >> The concept seems sensible to me. Various notes below... > >> > >> > > >> > Signed-off-by: Serge Hallyn > >> > --- > >> > include/linux/capability.h |5 ++- > >> > include/uapi/linux/capability.h | 18 > >> > include/uapi/linux/xattr.h |3 ++ > >> > security/commoncap.c| 91 > >> > +-- > >> > 4 files changed, 112 insertions(+), 5 deletions(-) > >> > > >> > diff --git a/include/linux/capability.h b/include/linux/capability.h > >> > index 00690ff..cf533ff 100644 > >> > --- a/include/linux/capability.h > >> > +++ b/include/linux/capability.h > >> > @@ -13,7 +13,7 @@ > >> > #define _LINUX_CAPABILITY_H > >> > > >> > #include > >> > - > >> > +#include > >> > > >> > #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 > >> > #define _KERNEL_CAPABILITY_U32S_LINUX_CAPABILITY_U32S_3 > >> > @@ -31,6 +31,9 @@ struct cpu_vfs_cap_data { > >> > kernel_cap_t inheritable; > >> > }; > >> > > >> > +#define NS_CAPS_VERSION(x) (x & 0xFF) > >> > +#define NS_CAPS_FLAGS(x) ((x >> 8) & 0xFF) > >> > + > >> > #define _USER_CAP_HEADER_SIZE (sizeof(struct __user_cap_header_struct)) > >> > #define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t)) > >> > > >> > diff --git a/include/uapi/linux/capability.h > >> > b/include/uapi/linux/capability.h > >> > index 12c37a1..f0b4a66 100644 > >> > --- a/include/uapi/linux/capability.h > >> > +++ b/include/uapi/linux/capability.h > >> > @@ -62,6 +62,9 @@ typedef struct __user_cap_data_struct { > >> > #define VFS_CAP_U32_2 2 > >> > #define XATTR_CAPS_SZ_2 (sizeof(__le32)*(1 + 2*VFS_CAP_U32_2)) > >> > > >> > +/* version number for security.nscapability xattrs hdr->hdr_info */ > >> > +#define VFS_NS_CAP_REVISION 1 > >> > + > >> > #define XATTR_CAPS_SZ XATTR_CAPS_SZ_2 > >> > #define VFS_CAP_U32 VFS_CAP_U32_2 > >> > #define VFS_CAP_REVISION VFS_CAP_REVISION_2 > >> > @@ -74,6 +77,21 @@ struct vfs_cap_data { > >> > } data[VFS_CAP_U32]; > >> > }; > >> > > >> > +#define VFS_NS_CAP_EFFECTIVE0x1 > >> > +/* > >> > + * 32-bit hdr_info contains > >> > + * 16 leftmost: reserved > >> > + * next 8: flags (only VFS_NS_CAP_EFFECTIVE so far) > >> > + * last 8: version > >> > + */ > >> > +struct vfs_ns_cap_data { > >> > + __le32 magic_etc; > >> > + struct { > >> > + __le32 permitted;/* Little endian */ > >> > + __le32 inheritable; /* Little endian */ > >> > + } data[VFS_CAP_U32]; > >> > +}; > >> > >> This is identical to vfs_cap_data. Is there a reason not to re-use the > >> existing one? > > > > Hm. I used to have them completely different. ATM the only difference > > is what goes into the magic_etc, and that not very (different). So > > yeah it probably should be re-used. > > > >> > + > >> > #ifndef __KERNEL__ > >> > > >> > /* > >> > diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h > >> > index 1590c49..67c80ab 100644 > >> > --- a/include/uapi/linux/xattr.h > >> > +++ b/include/uapi/linux/xattr.h > >> > @@ -68,6 +68,9 @@ > >> > #define XATTR_CAPS_SUFFIX "capability" > >> > #define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX > >> > > >> > +#define XATTR_NS_CAPS_SUFFIX "nscapability" > >> > +#define XATTR_NAME_NS_CAPS XATTR_SECURITY_PREFIX XATTR_NS_CAPS_SUFFIX > >> > + > >> > #define XATTR_POSIX_ACL_ACCESS "posix_acl_access" > >> > #define XATTR_NAME_POSIX_ACL_ACCESS XATTR_SYSTEM_PREFIX > >> > XATTR_POSIX_ACL_ACCESS > >> > #define XATTR_POSIX_ACL_DEFAULT "posix_acl_default" > >> > >> Are these documented anywhere in Documentation/ or in man pages? This > >> seems like it'd need a man-page update too. > > > > Yeah, if we decide we're ok with this strategy I'll update those. > > > >> > diff --git a/security/commoncap.c b/security/commoncap.c > >> > index 48071ed..8f3f34a 100644 > >> > --- a/security/commoncap.c > >> > +++ b/security/commoncap.c > >> > @@ -313,6 +313,10 @@ int cap_inode_need_killpriv(struct dentry *dentry) > >> > if (!inode->i_op->getxattr) > >> >return 0; > >> > > >> > + error = inode->i_op->getxattr(dentry, XATTR_NAME_NS_CAPS, NULL, > >> > 0); > >> > + if (error > 0) > >> > + return 1; > >> > + > >> > error
Re: [PATCH 1/1] simplified security.nscapability xattr
On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallynwrote: > Quoting Kees Cook (keesc...@chromium.org): >> On Fri, Apr 22, 2016 at 10:26 AM, wrote: >> > From: Serge Hallyn >> > >> > This can only be set by root in his own namespace, and will >> > only be respected by namespaces with that same root kuid >> > mapped as root, or namespaces descended from it. >> > >> > This allows a simple setxattr to work, allows tar/untar to >> > work, and allows us to tar in one namespace and untar in >> > another while preserving the capability, without risking >> > leaking privilege into a parent namespace. >> >> The concept seems sensible to me. Various notes below... >> >> > >> > Signed-off-by: Serge Hallyn >> > --- >> > include/linux/capability.h |5 ++- >> > include/uapi/linux/capability.h | 18 >> > include/uapi/linux/xattr.h |3 ++ >> > security/commoncap.c| 91 >> > +-- >> > 4 files changed, 112 insertions(+), 5 deletions(-) >> > >> > diff --git a/include/linux/capability.h b/include/linux/capability.h >> > index 00690ff..cf533ff 100644 >> > --- a/include/linux/capability.h >> > +++ b/include/linux/capability.h >> > @@ -13,7 +13,7 @@ >> > #define _LINUX_CAPABILITY_H >> > >> > #include >> > - >> > +#include >> > >> > #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 >> > #define _KERNEL_CAPABILITY_U32S_LINUX_CAPABILITY_U32S_3 >> > @@ -31,6 +31,9 @@ struct cpu_vfs_cap_data { >> > kernel_cap_t inheritable; >> > }; >> > >> > +#define NS_CAPS_VERSION(x) (x & 0xFF) >> > +#define NS_CAPS_FLAGS(x) ((x >> 8) & 0xFF) >> > + >> > #define _USER_CAP_HEADER_SIZE (sizeof(struct __user_cap_header_struct)) >> > #define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t)) >> > >> > diff --git a/include/uapi/linux/capability.h >> > b/include/uapi/linux/capability.h >> > index 12c37a1..f0b4a66 100644 >> > --- a/include/uapi/linux/capability.h >> > +++ b/include/uapi/linux/capability.h >> > @@ -62,6 +62,9 @@ typedef struct __user_cap_data_struct { >> > #define VFS_CAP_U32_2 2 >> > #define XATTR_CAPS_SZ_2 (sizeof(__le32)*(1 + 2*VFS_CAP_U32_2)) >> > >> > +/* version number for security.nscapability xattrs hdr->hdr_info */ >> > +#define VFS_NS_CAP_REVISION 1 >> > + >> > #define XATTR_CAPS_SZ XATTR_CAPS_SZ_2 >> > #define VFS_CAP_U32 VFS_CAP_U32_2 >> > #define VFS_CAP_REVISION VFS_CAP_REVISION_2 >> > @@ -74,6 +77,21 @@ struct vfs_cap_data { >> > } data[VFS_CAP_U32]; >> > }; >> > >> > +#define VFS_NS_CAP_EFFECTIVE0x1 >> > +/* >> > + * 32-bit hdr_info contains >> > + * 16 leftmost: reserved >> > + * next 8: flags (only VFS_NS_CAP_EFFECTIVE so far) >> > + * last 8: version >> > + */ >> > +struct vfs_ns_cap_data { >> > + __le32 magic_etc; >> > + struct { >> > + __le32 permitted;/* Little endian */ >> > + __le32 inheritable; /* Little endian */ >> > + } data[VFS_CAP_U32]; >> > +}; >> >> This is identical to vfs_cap_data. Is there a reason not to re-use the >> existing one? > > Hm. I used to have them completely different. ATM the only difference > is what goes into the magic_etc, and that not very (different). So > yeah it probably should be re-used. > >> > + >> > #ifndef __KERNEL__ >> > >> > /* >> > diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h >> > index 1590c49..67c80ab 100644 >> > --- a/include/uapi/linux/xattr.h >> > +++ b/include/uapi/linux/xattr.h >> > @@ -68,6 +68,9 @@ >> > #define XATTR_CAPS_SUFFIX "capability" >> > #define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX >> > >> > +#define XATTR_NS_CAPS_SUFFIX "nscapability" >> > +#define XATTR_NAME_NS_CAPS XATTR_SECURITY_PREFIX XATTR_NS_CAPS_SUFFIX >> > + >> > #define XATTR_POSIX_ACL_ACCESS "posix_acl_access" >> > #define XATTR_NAME_POSIX_ACL_ACCESS XATTR_SYSTEM_PREFIX >> > XATTR_POSIX_ACL_ACCESS >> > #define XATTR_POSIX_ACL_DEFAULT "posix_acl_default" >> >> Are these documented anywhere in Documentation/ or in man pages? This >> seems like it'd need a man-page update too. > > Yeah, if we decide we're ok with this strategy I'll update those. > >> > diff --git a/security/commoncap.c b/security/commoncap.c >> > index 48071ed..8f3f34a 100644 >> > --- a/security/commoncap.c >> > +++ b/security/commoncap.c >> > @@ -313,6 +313,10 @@ int cap_inode_need_killpriv(struct dentry *dentry) >> > if (!inode->i_op->getxattr) >> >return 0; >> > >> > + error = inode->i_op->getxattr(dentry, XATTR_NAME_NS_CAPS, NULL, 0); >> > + if (error > 0) >> > + return 1; >> > + >> > error = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, NULL, 0); >> > if (error <= 0) >> > return 0; >> >> I think this might be more readable if the getxattr calls were >> standardized (one
Re: [PATCH 1/1] simplified security.nscapability xattr
On Tue, Apr 26, 2016 at 3:26 PM, Serge E. Hallyn wrote: > Quoting Kees Cook (keesc...@chromium.org): >> On Fri, Apr 22, 2016 at 10:26 AM, wrote: >> > From: Serge Hallyn >> > >> > This can only be set by root in his own namespace, and will >> > only be respected by namespaces with that same root kuid >> > mapped as root, or namespaces descended from it. >> > >> > This allows a simple setxattr to work, allows tar/untar to >> > work, and allows us to tar in one namespace and untar in >> > another while preserving the capability, without risking >> > leaking privilege into a parent namespace. >> >> The concept seems sensible to me. Various notes below... >> >> > >> > Signed-off-by: Serge Hallyn >> > --- >> > include/linux/capability.h |5 ++- >> > include/uapi/linux/capability.h | 18 >> > include/uapi/linux/xattr.h |3 ++ >> > security/commoncap.c| 91 >> > +-- >> > 4 files changed, 112 insertions(+), 5 deletions(-) >> > >> > diff --git a/include/linux/capability.h b/include/linux/capability.h >> > index 00690ff..cf533ff 100644 >> > --- a/include/linux/capability.h >> > +++ b/include/linux/capability.h >> > @@ -13,7 +13,7 @@ >> > #define _LINUX_CAPABILITY_H >> > >> > #include >> > - >> > +#include >> > >> > #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 >> > #define _KERNEL_CAPABILITY_U32S_LINUX_CAPABILITY_U32S_3 >> > @@ -31,6 +31,9 @@ struct cpu_vfs_cap_data { >> > kernel_cap_t inheritable; >> > }; >> > >> > +#define NS_CAPS_VERSION(x) (x & 0xFF) >> > +#define NS_CAPS_FLAGS(x) ((x >> 8) & 0xFF) >> > + >> > #define _USER_CAP_HEADER_SIZE (sizeof(struct __user_cap_header_struct)) >> > #define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t)) >> > >> > diff --git a/include/uapi/linux/capability.h >> > b/include/uapi/linux/capability.h >> > index 12c37a1..f0b4a66 100644 >> > --- a/include/uapi/linux/capability.h >> > +++ b/include/uapi/linux/capability.h >> > @@ -62,6 +62,9 @@ typedef struct __user_cap_data_struct { >> > #define VFS_CAP_U32_2 2 >> > #define XATTR_CAPS_SZ_2 (sizeof(__le32)*(1 + 2*VFS_CAP_U32_2)) >> > >> > +/* version number for security.nscapability xattrs hdr->hdr_info */ >> > +#define VFS_NS_CAP_REVISION 1 >> > + >> > #define XATTR_CAPS_SZ XATTR_CAPS_SZ_2 >> > #define VFS_CAP_U32 VFS_CAP_U32_2 >> > #define VFS_CAP_REVISION VFS_CAP_REVISION_2 >> > @@ -74,6 +77,21 @@ struct vfs_cap_data { >> > } data[VFS_CAP_U32]; >> > }; >> > >> > +#define VFS_NS_CAP_EFFECTIVE0x1 >> > +/* >> > + * 32-bit hdr_info contains >> > + * 16 leftmost: reserved >> > + * next 8: flags (only VFS_NS_CAP_EFFECTIVE so far) >> > + * last 8: version >> > + */ >> > +struct vfs_ns_cap_data { >> > + __le32 magic_etc; >> > + struct { >> > + __le32 permitted;/* Little endian */ >> > + __le32 inheritable; /* Little endian */ >> > + } data[VFS_CAP_U32]; >> > +}; >> >> This is identical to vfs_cap_data. Is there a reason not to re-use the >> existing one? > > Hm. I used to have them completely different. ATM the only difference > is what goes into the magic_etc, and that not very (different). So > yeah it probably should be re-used. > >> > + >> > #ifndef __KERNEL__ >> > >> > /* >> > diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h >> > index 1590c49..67c80ab 100644 >> > --- a/include/uapi/linux/xattr.h >> > +++ b/include/uapi/linux/xattr.h >> > @@ -68,6 +68,9 @@ >> > #define XATTR_CAPS_SUFFIX "capability" >> > #define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX >> > >> > +#define XATTR_NS_CAPS_SUFFIX "nscapability" >> > +#define XATTR_NAME_NS_CAPS XATTR_SECURITY_PREFIX XATTR_NS_CAPS_SUFFIX >> > + >> > #define XATTR_POSIX_ACL_ACCESS "posix_acl_access" >> > #define XATTR_NAME_POSIX_ACL_ACCESS XATTR_SYSTEM_PREFIX >> > XATTR_POSIX_ACL_ACCESS >> > #define XATTR_POSIX_ACL_DEFAULT "posix_acl_default" >> >> Are these documented anywhere in Documentation/ or in man pages? This >> seems like it'd need a man-page update too. > > Yeah, if we decide we're ok with this strategy I'll update those. > >> > diff --git a/security/commoncap.c b/security/commoncap.c >> > index 48071ed..8f3f34a 100644 >> > --- a/security/commoncap.c >> > +++ b/security/commoncap.c >> > @@ -313,6 +313,10 @@ int cap_inode_need_killpriv(struct dentry *dentry) >> > if (!inode->i_op->getxattr) >> >return 0; >> > >> > + error = inode->i_op->getxattr(dentry, XATTR_NAME_NS_CAPS, NULL, 0); >> > + if (error > 0) >> > + return 1; >> > + >> > error = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, NULL, 0); >> > if (error <= 0) >> > return 0; >> >> I think this might be more readable if the getxattr calls were >> standardized (one returns 1, the other returns 0, with inverted tests >> -- hard to read). IIUC, if either returns
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Kees Cook (keesc...@chromium.org): > On Fri, Apr 22, 2016 at 10:26 AM,wrote: > > From: Serge Hallyn > > > > This can only be set by root in his own namespace, and will > > only be respected by namespaces with that same root kuid > > mapped as root, or namespaces descended from it. > > > > This allows a simple setxattr to work, allows tar/untar to > > work, and allows us to tar in one namespace and untar in > > another while preserving the capability, without risking > > leaking privilege into a parent namespace. > > The concept seems sensible to me. Various notes below... > > > > > Signed-off-by: Serge Hallyn > > --- > > include/linux/capability.h |5 ++- > > include/uapi/linux/capability.h | 18 > > include/uapi/linux/xattr.h |3 ++ > > security/commoncap.c| 91 > > +-- > > 4 files changed, 112 insertions(+), 5 deletions(-) > > > > diff --git a/include/linux/capability.h b/include/linux/capability.h > > index 00690ff..cf533ff 100644 > > --- a/include/linux/capability.h > > +++ b/include/linux/capability.h > > @@ -13,7 +13,7 @@ > > #define _LINUX_CAPABILITY_H > > > > #include > > - > > +#include > > > > #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 > > #define _KERNEL_CAPABILITY_U32S_LINUX_CAPABILITY_U32S_3 > > @@ -31,6 +31,9 @@ struct cpu_vfs_cap_data { > > kernel_cap_t inheritable; > > }; > > > > +#define NS_CAPS_VERSION(x) (x & 0xFF) > > +#define NS_CAPS_FLAGS(x) ((x >> 8) & 0xFF) > > + > > #define _USER_CAP_HEADER_SIZE (sizeof(struct __user_cap_header_struct)) > > #define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t)) > > > > diff --git a/include/uapi/linux/capability.h > > b/include/uapi/linux/capability.h > > index 12c37a1..f0b4a66 100644 > > --- a/include/uapi/linux/capability.h > > +++ b/include/uapi/linux/capability.h > > @@ -62,6 +62,9 @@ typedef struct __user_cap_data_struct { > > #define VFS_CAP_U32_2 2 > > #define XATTR_CAPS_SZ_2 (sizeof(__le32)*(1 + 2*VFS_CAP_U32_2)) > > > > +/* version number for security.nscapability xattrs hdr->hdr_info */ > > +#define VFS_NS_CAP_REVISION 1 > > + > > #define XATTR_CAPS_SZ XATTR_CAPS_SZ_2 > > #define VFS_CAP_U32 VFS_CAP_U32_2 > > #define VFS_CAP_REVISION VFS_CAP_REVISION_2 > > @@ -74,6 +77,21 @@ struct vfs_cap_data { > > } data[VFS_CAP_U32]; > > }; > > > > +#define VFS_NS_CAP_EFFECTIVE0x1 > > +/* > > + * 32-bit hdr_info contains > > + * 16 leftmost: reserved > > + * next 8: flags (only VFS_NS_CAP_EFFECTIVE so far) > > + * last 8: version > > + */ > > +struct vfs_ns_cap_data { > > + __le32 magic_etc; > > + struct { > > + __le32 permitted;/* Little endian */ > > + __le32 inheritable; /* Little endian */ > > + } data[VFS_CAP_U32]; > > +}; > > This is identical to vfs_cap_data. Is there a reason not to re-use the > existing one? Hm. I used to have them completely different. ATM the only difference is what goes into the magic_etc, and that not very (different). So yeah it probably should be re-used. > > + > > #ifndef __KERNEL__ > > > > /* > > diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h > > index 1590c49..67c80ab 100644 > > --- a/include/uapi/linux/xattr.h > > +++ b/include/uapi/linux/xattr.h > > @@ -68,6 +68,9 @@ > > #define XATTR_CAPS_SUFFIX "capability" > > #define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX > > > > +#define XATTR_NS_CAPS_SUFFIX "nscapability" > > +#define XATTR_NAME_NS_CAPS XATTR_SECURITY_PREFIX XATTR_NS_CAPS_SUFFIX > > + > > #define XATTR_POSIX_ACL_ACCESS "posix_acl_access" > > #define XATTR_NAME_POSIX_ACL_ACCESS XATTR_SYSTEM_PREFIX > > XATTR_POSIX_ACL_ACCESS > > #define XATTR_POSIX_ACL_DEFAULT "posix_acl_default" > > Are these documented anywhere in Documentation/ or in man pages? This > seems like it'd need a man-page update too. Yeah, if we decide we're ok with this strategy I'll update those. > > diff --git a/security/commoncap.c b/security/commoncap.c > > index 48071ed..8f3f34a 100644 > > --- a/security/commoncap.c > > +++ b/security/commoncap.c > > @@ -313,6 +313,10 @@ int cap_inode_need_killpriv(struct dentry *dentry) > > if (!inode->i_op->getxattr) > >return 0; > > > > + error = inode->i_op->getxattr(dentry, XATTR_NAME_NS_CAPS, NULL, 0); > > + if (error > 0) > > + return 1; > > + > > error = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, NULL, 0); > > if (error <= 0) > > return 0; > > I think this might be more readable if the getxattr calls were > standardized (one returns 1, the other returns 0, with inverted tests > -- hard to read). IIUC, if either returns > 0, return 1, otherwise > return 0. Hm. Yeah I can see where that would be confusing. I can change the flow to
Re: [PATCH 1/1] simplified security.nscapability xattr
Quoting Kees Cook (keesc...@chromium.org): > On Fri, Apr 22, 2016 at 10:26 AM, wrote: > > From: Serge Hallyn > > > > This can only be set by root in his own namespace, and will > > only be respected by namespaces with that same root kuid > > mapped as root, or namespaces descended from it. > > > > This allows a simple setxattr to work, allows tar/untar to > > work, and allows us to tar in one namespace and untar in > > another while preserving the capability, without risking > > leaking privilege into a parent namespace. > > The concept seems sensible to me. Various notes below... > > > > > Signed-off-by: Serge Hallyn > > --- > > include/linux/capability.h |5 ++- > > include/uapi/linux/capability.h | 18 > > include/uapi/linux/xattr.h |3 ++ > > security/commoncap.c| 91 > > +-- > > 4 files changed, 112 insertions(+), 5 deletions(-) > > > > diff --git a/include/linux/capability.h b/include/linux/capability.h > > index 00690ff..cf533ff 100644 > > --- a/include/linux/capability.h > > +++ b/include/linux/capability.h > > @@ -13,7 +13,7 @@ > > #define _LINUX_CAPABILITY_H > > > > #include > > - > > +#include > > > > #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 > > #define _KERNEL_CAPABILITY_U32S_LINUX_CAPABILITY_U32S_3 > > @@ -31,6 +31,9 @@ struct cpu_vfs_cap_data { > > kernel_cap_t inheritable; > > }; > > > > +#define NS_CAPS_VERSION(x) (x & 0xFF) > > +#define NS_CAPS_FLAGS(x) ((x >> 8) & 0xFF) > > + > > #define _USER_CAP_HEADER_SIZE (sizeof(struct __user_cap_header_struct)) > > #define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t)) > > > > diff --git a/include/uapi/linux/capability.h > > b/include/uapi/linux/capability.h > > index 12c37a1..f0b4a66 100644 > > --- a/include/uapi/linux/capability.h > > +++ b/include/uapi/linux/capability.h > > @@ -62,6 +62,9 @@ typedef struct __user_cap_data_struct { > > #define VFS_CAP_U32_2 2 > > #define XATTR_CAPS_SZ_2 (sizeof(__le32)*(1 + 2*VFS_CAP_U32_2)) > > > > +/* version number for security.nscapability xattrs hdr->hdr_info */ > > +#define VFS_NS_CAP_REVISION 1 > > + > > #define XATTR_CAPS_SZ XATTR_CAPS_SZ_2 > > #define VFS_CAP_U32 VFS_CAP_U32_2 > > #define VFS_CAP_REVISION VFS_CAP_REVISION_2 > > @@ -74,6 +77,21 @@ struct vfs_cap_data { > > } data[VFS_CAP_U32]; > > }; > > > > +#define VFS_NS_CAP_EFFECTIVE0x1 > > +/* > > + * 32-bit hdr_info contains > > + * 16 leftmost: reserved > > + * next 8: flags (only VFS_NS_CAP_EFFECTIVE so far) > > + * last 8: version > > + */ > > +struct vfs_ns_cap_data { > > + __le32 magic_etc; > > + struct { > > + __le32 permitted;/* Little endian */ > > + __le32 inheritable; /* Little endian */ > > + } data[VFS_CAP_U32]; > > +}; > > This is identical to vfs_cap_data. Is there a reason not to re-use the > existing one? Hm. I used to have them completely different. ATM the only difference is what goes into the magic_etc, and that not very (different). So yeah it probably should be re-used. > > + > > #ifndef __KERNEL__ > > > > /* > > diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h > > index 1590c49..67c80ab 100644 > > --- a/include/uapi/linux/xattr.h > > +++ b/include/uapi/linux/xattr.h > > @@ -68,6 +68,9 @@ > > #define XATTR_CAPS_SUFFIX "capability" > > #define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX > > > > +#define XATTR_NS_CAPS_SUFFIX "nscapability" > > +#define XATTR_NAME_NS_CAPS XATTR_SECURITY_PREFIX XATTR_NS_CAPS_SUFFIX > > + > > #define XATTR_POSIX_ACL_ACCESS "posix_acl_access" > > #define XATTR_NAME_POSIX_ACL_ACCESS XATTR_SYSTEM_PREFIX > > XATTR_POSIX_ACL_ACCESS > > #define XATTR_POSIX_ACL_DEFAULT "posix_acl_default" > > Are these documented anywhere in Documentation/ or in man pages? This > seems like it'd need a man-page update too. Yeah, if we decide we're ok with this strategy I'll update those. > > diff --git a/security/commoncap.c b/security/commoncap.c > > index 48071ed..8f3f34a 100644 > > --- a/security/commoncap.c > > +++ b/security/commoncap.c > > @@ -313,6 +313,10 @@ int cap_inode_need_killpriv(struct dentry *dentry) > > if (!inode->i_op->getxattr) > >return 0; > > > > + error = inode->i_op->getxattr(dentry, XATTR_NAME_NS_CAPS, NULL, 0); > > + if (error > 0) > > + return 1; > > + > > error = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, NULL, 0); > > if (error <= 0) > > return 0; > > I think this might be more readable if the getxattr calls were > standardized (one returns 1, the other returns 0, with inverted tests > -- hard to read). IIUC, if either returns > 0, return 1, otherwise > return 0. Hm. Yeah I can see where that would be confusing. I can change the flow to make the checks the same. > > @@ -330,11 +334,17 @@ int
Re: [PATCH 1/1] simplified security.nscapability xattr
On Fri, Apr 22, 2016 at 10:26 AM,wrote: > From: Serge Hallyn > > This can only be set by root in his own namespace, and will > only be respected by namespaces with that same root kuid > mapped as root, or namespaces descended from it. > > This allows a simple setxattr to work, allows tar/untar to > work, and allows us to tar in one namespace and untar in > another while preserving the capability, without risking > leaking privilege into a parent namespace. The concept seems sensible to me. Various notes below... > > Signed-off-by: Serge Hallyn > --- > include/linux/capability.h |5 ++- > include/uapi/linux/capability.h | 18 > include/uapi/linux/xattr.h |3 ++ > security/commoncap.c| 91 > +-- > 4 files changed, 112 insertions(+), 5 deletions(-) > > diff --git a/include/linux/capability.h b/include/linux/capability.h > index 00690ff..cf533ff 100644 > --- a/include/linux/capability.h > +++ b/include/linux/capability.h > @@ -13,7 +13,7 @@ > #define _LINUX_CAPABILITY_H > > #include > - > +#include > > #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 > #define _KERNEL_CAPABILITY_U32S_LINUX_CAPABILITY_U32S_3 > @@ -31,6 +31,9 @@ struct cpu_vfs_cap_data { > kernel_cap_t inheritable; > }; > > +#define NS_CAPS_VERSION(x) (x & 0xFF) > +#define NS_CAPS_FLAGS(x) ((x >> 8) & 0xFF) > + > #define _USER_CAP_HEADER_SIZE (sizeof(struct __user_cap_header_struct)) > #define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t)) > > diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h > index 12c37a1..f0b4a66 100644 > --- a/include/uapi/linux/capability.h > +++ b/include/uapi/linux/capability.h > @@ -62,6 +62,9 @@ typedef struct __user_cap_data_struct { > #define VFS_CAP_U32_2 2 > #define XATTR_CAPS_SZ_2 (sizeof(__le32)*(1 + 2*VFS_CAP_U32_2)) > > +/* version number for security.nscapability xattrs hdr->hdr_info */ > +#define VFS_NS_CAP_REVISION 1 > + > #define XATTR_CAPS_SZ XATTR_CAPS_SZ_2 > #define VFS_CAP_U32 VFS_CAP_U32_2 > #define VFS_CAP_REVISION VFS_CAP_REVISION_2 > @@ -74,6 +77,21 @@ struct vfs_cap_data { > } data[VFS_CAP_U32]; > }; > > +#define VFS_NS_CAP_EFFECTIVE0x1 > +/* > + * 32-bit hdr_info contains > + * 16 leftmost: reserved > + * next 8: flags (only VFS_NS_CAP_EFFECTIVE so far) > + * last 8: version > + */ > +struct vfs_ns_cap_data { > + __le32 magic_etc; > + struct { > + __le32 permitted;/* Little endian */ > + __le32 inheritable; /* Little endian */ > + } data[VFS_CAP_U32]; > +}; This is identical to vfs_cap_data. Is there a reason not to re-use the existing one? > + > #ifndef __KERNEL__ > > /* > diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h > index 1590c49..67c80ab 100644 > --- a/include/uapi/linux/xattr.h > +++ b/include/uapi/linux/xattr.h > @@ -68,6 +68,9 @@ > #define XATTR_CAPS_SUFFIX "capability" > #define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX > > +#define XATTR_NS_CAPS_SUFFIX "nscapability" > +#define XATTR_NAME_NS_CAPS XATTR_SECURITY_PREFIX XATTR_NS_CAPS_SUFFIX > + > #define XATTR_POSIX_ACL_ACCESS "posix_acl_access" > #define XATTR_NAME_POSIX_ACL_ACCESS XATTR_SYSTEM_PREFIX > XATTR_POSIX_ACL_ACCESS > #define XATTR_POSIX_ACL_DEFAULT "posix_acl_default" Are these documented anywhere in Documentation/ or in man pages? This seems like it'd need a man-page update too. > diff --git a/security/commoncap.c b/security/commoncap.c > index 48071ed..8f3f34a 100644 > --- a/security/commoncap.c > +++ b/security/commoncap.c > @@ -313,6 +313,10 @@ int cap_inode_need_killpriv(struct dentry *dentry) > if (!inode->i_op->getxattr) >return 0; > > + error = inode->i_op->getxattr(dentry, XATTR_NAME_NS_CAPS, NULL, 0); > + if (error > 0) > + return 1; > + > error = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, NULL, 0); > if (error <= 0) > return 0; I think this might be more readable if the getxattr calls were standardized (one returns 1, the other returns 0, with inverted tests -- hard to read). IIUC, if either returns > 0, return 1, otherwise return 0. > @@ -330,11 +334,17 @@ int cap_inode_need_killpriv(struct dentry *dentry) > int cap_inode_killpriv(struct dentry *dentry) > { > struct inode *inode = d_backing_inode(dentry); > + int ret1, ret2; > > if (!inode->i_op->removexattr) >return 0; > > - return inode->i_op->removexattr(dentry, XATTR_NAME_CAPS); > + ret1 = inode->i_op->removexattr(dentry, XATTR_NAME_CAPS); > + ret2 = inode->i_op->removexattr(dentry, XATTR_NAME_NS_CAPS); > + > + if (ret1 != 0) > + return ret1; > + return ret2; > } > > /* > @@ -438,6 +448,65 @@ int
Re: [PATCH 1/1] simplified security.nscapability xattr
On Fri, Apr 22, 2016 at 10:26 AM, wrote: > From: Serge Hallyn > > This can only be set by root in his own namespace, and will > only be respected by namespaces with that same root kuid > mapped as root, or namespaces descended from it. > > This allows a simple setxattr to work, allows tar/untar to > work, and allows us to tar in one namespace and untar in > another while preserving the capability, without risking > leaking privilege into a parent namespace. The concept seems sensible to me. Various notes below... > > Signed-off-by: Serge Hallyn > --- > include/linux/capability.h |5 ++- > include/uapi/linux/capability.h | 18 > include/uapi/linux/xattr.h |3 ++ > security/commoncap.c| 91 > +-- > 4 files changed, 112 insertions(+), 5 deletions(-) > > diff --git a/include/linux/capability.h b/include/linux/capability.h > index 00690ff..cf533ff 100644 > --- a/include/linux/capability.h > +++ b/include/linux/capability.h > @@ -13,7 +13,7 @@ > #define _LINUX_CAPABILITY_H > > #include > - > +#include > > #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3 > #define _KERNEL_CAPABILITY_U32S_LINUX_CAPABILITY_U32S_3 > @@ -31,6 +31,9 @@ struct cpu_vfs_cap_data { > kernel_cap_t inheritable; > }; > > +#define NS_CAPS_VERSION(x) (x & 0xFF) > +#define NS_CAPS_FLAGS(x) ((x >> 8) & 0xFF) > + > #define _USER_CAP_HEADER_SIZE (sizeof(struct __user_cap_header_struct)) > #define _KERNEL_CAP_T_SIZE (sizeof(kernel_cap_t)) > > diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h > index 12c37a1..f0b4a66 100644 > --- a/include/uapi/linux/capability.h > +++ b/include/uapi/linux/capability.h > @@ -62,6 +62,9 @@ typedef struct __user_cap_data_struct { > #define VFS_CAP_U32_2 2 > #define XATTR_CAPS_SZ_2 (sizeof(__le32)*(1 + 2*VFS_CAP_U32_2)) > > +/* version number for security.nscapability xattrs hdr->hdr_info */ > +#define VFS_NS_CAP_REVISION 1 > + > #define XATTR_CAPS_SZ XATTR_CAPS_SZ_2 > #define VFS_CAP_U32 VFS_CAP_U32_2 > #define VFS_CAP_REVISION VFS_CAP_REVISION_2 > @@ -74,6 +77,21 @@ struct vfs_cap_data { > } data[VFS_CAP_U32]; > }; > > +#define VFS_NS_CAP_EFFECTIVE0x1 > +/* > + * 32-bit hdr_info contains > + * 16 leftmost: reserved > + * next 8: flags (only VFS_NS_CAP_EFFECTIVE so far) > + * last 8: version > + */ > +struct vfs_ns_cap_data { > + __le32 magic_etc; > + struct { > + __le32 permitted;/* Little endian */ > + __le32 inheritable; /* Little endian */ > + } data[VFS_CAP_U32]; > +}; This is identical to vfs_cap_data. Is there a reason not to re-use the existing one? > + > #ifndef __KERNEL__ > > /* > diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h > index 1590c49..67c80ab 100644 > --- a/include/uapi/linux/xattr.h > +++ b/include/uapi/linux/xattr.h > @@ -68,6 +68,9 @@ > #define XATTR_CAPS_SUFFIX "capability" > #define XATTR_NAME_CAPS XATTR_SECURITY_PREFIX XATTR_CAPS_SUFFIX > > +#define XATTR_NS_CAPS_SUFFIX "nscapability" > +#define XATTR_NAME_NS_CAPS XATTR_SECURITY_PREFIX XATTR_NS_CAPS_SUFFIX > + > #define XATTR_POSIX_ACL_ACCESS "posix_acl_access" > #define XATTR_NAME_POSIX_ACL_ACCESS XATTR_SYSTEM_PREFIX > XATTR_POSIX_ACL_ACCESS > #define XATTR_POSIX_ACL_DEFAULT "posix_acl_default" Are these documented anywhere in Documentation/ or in man pages? This seems like it'd need a man-page update too. > diff --git a/security/commoncap.c b/security/commoncap.c > index 48071ed..8f3f34a 100644 > --- a/security/commoncap.c > +++ b/security/commoncap.c > @@ -313,6 +313,10 @@ int cap_inode_need_killpriv(struct dentry *dentry) > if (!inode->i_op->getxattr) >return 0; > > + error = inode->i_op->getxattr(dentry, XATTR_NAME_NS_CAPS, NULL, 0); > + if (error > 0) > + return 1; > + > error = inode->i_op->getxattr(dentry, XATTR_NAME_CAPS, NULL, 0); > if (error <= 0) > return 0; I think this might be more readable if the getxattr calls were standardized (one returns 1, the other returns 0, with inverted tests -- hard to read). IIUC, if either returns > 0, return 1, otherwise return 0. > @@ -330,11 +334,17 @@ int cap_inode_need_killpriv(struct dentry *dentry) > int cap_inode_killpriv(struct dentry *dentry) > { > struct inode *inode = d_backing_inode(dentry); > + int ret1, ret2; > > if (!inode->i_op->removexattr) >return 0; > > - return inode->i_op->removexattr(dentry, XATTR_NAME_CAPS); > + ret1 = inode->i_op->removexattr(dentry, XATTR_NAME_CAPS); > + ret2 = inode->i_op->removexattr(dentry, XATTR_NAME_NS_CAPS); > + > + if (ret1 != 0) > + return ret1; > + return ret2; > } > > /* > @@ -438,6 +448,65 @@ int get_vfs_caps_from_disk(const struct dentry *dentry, > struct cpu_vfs_cap_data
Re: [PATCH 1/1] simplified security.nscapability xattr
On Fri, Apr 22, 2016 at 12:26:33PM -0500, serge.hal...@ubuntu.com wrote: > From: Serge Hallyn> > This can only be set by root in his own namespace, and will > only be respected by namespaces with that same root kuid > mapped as root, or namespaces descended from it. > > This allows a simple setxattr to work, allows tar/untar to > work, and allows us to tar in one namespace and untar in > another while preserving the capability, without risking > leaking privilege into a parent namespace. > > Signed-off-by: Serge Hallyn Seems like the simplest possible design which meets the requirements. Acked-by: Seth Forshee
Re: [PATCH 1/1] simplified security.nscapability xattr
On Fri, Apr 22, 2016 at 12:26:33PM -0500, serge.hal...@ubuntu.com wrote: > From: Serge Hallyn > > This can only be set by root in his own namespace, and will > only be respected by namespaces with that same root kuid > mapped as root, or namespaces descended from it. > > This allows a simple setxattr to work, allows tar/untar to > work, and allows us to tar in one namespace and untar in > another while preserving the capability, without risking > leaking privilege into a parent namespace. > > Signed-off-by: Serge Hallyn Seems like the simplest possible design which meets the requirements. Acked-by: Seth Forshee