Re: glibc [BZ #22145]: {p,ty}fds and mount namespaces

2017-10-10 Thread Christian Brauner
On Tue, Oct 10, 2017 at 5:44 PM, Chet Ramey  wrote:
> On 10/9/17 10:37 AM, Christian Brauner wrote:
>
>> A common scenario where this happens is with /dev/console in containers.
>> Usually container runtimes/managers will call openpty() on a ptmx device in 
>> the
>> host's mount namespace to safely allocate a {p,t}ty master-slave pair since 
>> they
>> can't trust the container's devpts mount after the container's init binary 
>> has
>> started (potentially malicious fuse mounts and what not).  The slave {p,t}ty 
>> fd
>> will then usually be sent to the container and bind-mounted over the 
>> container's
>> /dev/console which in this scenario is simply a regular file. This is 
>> especially
>> common with unprivileged containers where mknod() syscalls are not possible. 
>> In
>> this scenario ttyname{_r}() will correctly report that /dev/console does in 
>> fact
>> refer to a {p,t}ty device whose path exists in the current mount namespace 
>> but
>> whose origin is a devpts mount in a different mount namespace. Bash however
>> seems to not like this at all and fails to initialize job control correctly. 
>> In
>> case you have lxc available this is simply reproducible by creating an
>> unprivileged container and calling lxc-execute -n  -- bash.  
>> If
>> you could look into this and whether that makes sense to you it'd be greatly
>> appreciated.
>
> Bash doesn't try to open /dev/console. It will, however, try to open
> /dev/tty and, if that fails, call ttyname() to get the pathname of a
> terminal device to pass to open(). The idea is that if you're started
> without a controlling terminal, the first terminal device you open becomes
> your controlling terminal. However, if that fails, job control will
> eventually be disabled -- you can't have job control without a controlling
> terminal.
>
> Under the circumstances described in the original bug report, bash attempts
> to use stderr as its controlling terminal (having already called isatty and
> been told that it's a terminal), discovers that it cannot set the process
> group on that, and disables job control. If you can't set the process group
> on what you think is your controlling terminal, you're not going to be able
> to do job control, period.

Right, this was what I found confusing at first that in fact there was
a controlling
terminal but bash didn't initialize job control. Now, I think I simply
traced it down to
some programs not being able to setsid() or not having set setsid() before
exec()ing bash in the child after a fork(). This is what is causing
job control to fail.

Thanks Chet!
Christian

>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



glibc [BZ #22145]: {p,ty}fds and mount namespaces

2017-10-09 Thread Christian Brauner
Hi,

We've received a bug report against glibc [1] relating to {p,t}ty file
descriptors from devpts mounts in different mount namespaces.  In case
ttyname{_r}() detects that the path for a pty slave file descriptor (e.g.
/dev/pts/4) does not exist in the caller's mount namespace or
the path exists by pure chance (because someone has e.g. opened five {p,t}tys in
the current mount namespace) but does in fact refer to a different device then
ttyname{_r}() will set/return ENODEV. On Linux the caller can treat this as a
hint that the {p,t}y file descriptor's path does not exist in the current mount
namespace. However, in case the path for the {p,t}ty file descriptor
does actually exist in the current mount namespace although the
{p,t}ty fd belongs to a devpts mount in another mount namespace seems
to confuse bash such that bash fails to initialize job control
correctly. This at least is my current analysis of the
problem.
A common scenario where this happens is with /dev/console in containers.
Usually container runtimes/managers will call openpty() on a ptmx device in the
host's mount namespace to safely allocate a {p,t}ty master-slave pair since they
can't trust the container's devpts mount after the container's init binary has
started (potentially malicious fuse mounts and what not).  The slave {p,t}ty fd
will then usually be sent to the container and bind-mounted over the container's
/dev/console which in this scenario is simply a regular file. This is especially
common with unprivileged containers where mknod() syscalls are not possible. In
this scenario ttyname{_r}() will correctly report that /dev/console does in fact
refer to a {p,t}ty device whose path exists in the current mount namespace but
whose origin is a devpts mount in a different mount namespace. Bash however
seems to not like this at all and fails to initialize job control correctly. In
case you have lxc available this is simply reproducible by creating an
unprivileged container and calling lxc-execute -n  -- bash.  If
you could look into this and whether that makes sense to you it'd be greatly
appreciated.

Fwiw, zsh does not seem to run into a problem here.

Thanks
Christian

[1]: https://sourceware.org/bugzilla/show_bug.cgi?id=22145