On Sat, Dec 26, 2015 at 09:55:50PM +0100, Jann Horn wrote:
> On Sat, Dec 26, 2015 at 02:23:45PM -0600, Serge E. Hallyn wrote:
> > On Sat, Dec 26, 2015 at 02:10:38AM +0100, Jann Horn wrote:
> > > On Sat, Dec 12, 2015 at 09:12:41PM +0100, Jann Horn wrote:
> > > > With this change, the entering process can first enter the
> > > > namespace and then safely inspect the namespace's
> > > > properties, e.g. through /proc/self/{uid_map,gid_map},
> > > > assuming that the namespace owner doesn't have access to
> > > > uid 0.
> > > 
> > > Actually, I think I missed something there. Well, at least it
> > > should not directly lead to a container escape.
> > > 
> > > 
> > > > -static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
> > > > +static bool ptrace_has_cap(const struct cred *tcred, unsigned int mode)
> > > >  {
> > > > +       struct user_namespace *tns = tcred->user_ns;
> > > > +       struct user_namespace *curns = current_cred()->user_ns;
> > > > +
> > > > +       /* When a root-owned process enters a user namespace created by 
> > > > a
> > > > +        * malicious user, the user shouldn't be able to execute code 
> > > > under
> > > > +        * uid 0 by attaching to the root-owned process via ptrace.
> > > > +        * Therefore, similar to the capable_wrt_inode_uidgid() check,
> > > > +        * verify that all the uids and gids of the target process are
> > > > +        * mapped into the current namespace.
> > > > +        * No fsuid/fsgid check because __ptrace_may_access doesn't do 
> > > > it
> > > > +        * either.
> > > > +        */
> > > > +       if (!kuid_has_mapping(curns, tcred->euid) ||
> > > > +                       !kuid_has_mapping(curns, tcred->suid) ||
> > > > +                       !kuid_has_mapping(curns, tcred->uid)  ||
> > > > +                       !kgid_has_mapping(curns, tcred->egid) ||
> > > > +                       !kgid_has_mapping(curns, tcred->sgid) ||
> > > > +                       !kgid_has_mapping(curns, tcred->gid))
> > > > +               return false;
> > > > +
> > > >         if (mode & PTRACE_MODE_NOAUDIT)
> > > > -               return has_ns_capability_noaudit(current, ns, 
> > > > CAP_SYS_PTRACE);
> > > > +               return has_ns_capability_noaudit(current, tns, 
> > > > CAP_SYS_PTRACE);
> > > >         else
> > > > -               return has_ns_capability(current, ns, CAP_SYS_PTRACE);
> > > > +               return has_ns_capability(current, tns, CAP_SYS_PTRACE);
> > > >  }
> > > 
> > > If the namespace owner can run code in the init namespace, the kuids are
> > > mapped into curns but he is still capable wrt the target namespace.
> > > 
> > > I think a proper fix should first determine the highest parent of
> > > tcred->user_ns in which the caller still has privs, then do the
> > > kxid_has_mapping() checks in there.
> > 
> > Hi,
> > 
> > I don't quite follow what you are concerned about.  Based on the new
> > patch you sent, I assume it's not the case where the tcred's kuid is
> > actually mapped into the container.  So is it the case where I
> > unshare a userns which unshares a userns, then setns from the grandparent
> > into the child?  And if so, the concern is that if the setns()ing task's
> > kuid is mappable all along into the grandhild, then container root should
> > be able to ptrace it?
> 
> Consider the following scenario:
> 
> init_user_ns has a child namespace (I'll call it child_ns).
> child_ns is owned by an attacker (child_ns->owner == attacker_kuid).
> The attacking process has current_cred()->euid == attacker_kuid and lives
> in init_user_ns (which means it's capable in child_ns).

Ah, right.  Special.

Thanks.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to