> On Jan. 14, 2019, 8:31 a.m., Qian Zhang wrote:
> > src/linux/seccomp/seccomp.cpp
> > Lines 137-139 (patched)
> > <https://reviews.apache.org/r/68018/diff/14/?file=2117423#file2117423line137>
> >
> >     Will this affect the task run by Mesos? E.g., a task may want to run a 
> > program which has `set-user-ID` bit.
> 
> Andrei Budnik wrote:
>     Yes, `no_new_privs` flag affects the task that wants to run a program 
> which has `set-user-ID` bit.
>     E.g., launching a `ping -c 3 8.8.8.8` fails with seccomp. You'll see a 
> message in executor logs:
>     ```
>     I0114 07:19:21.887670 13264 executor.cpp:706] Forked command at 13276
>     ping: socket: Operation not permitted
>     I0114 07:19:22.055352 13263 executor.cpp:1007] Command exited with status 
> 2 (pid: 13276)
>     ```
>     
>     Also, see my previous comment 
> https://reviews.apache.org/r/68018/#comment297000
> 
> Qian Zhang wrote:
>     In your previous comment, you mentioned that Docker daemon launches its 
> containers with `SCMP_FLTATR_CTL_NNP` flag set by default, does that mean any 
> containers launched by Docker daemon cannot run program which has set-user-ID 
> bit?
>     
>     This seems unfortunate since it might break some use cases or 
> applications that we already supported. And can you please elaborate a bit 
> about `"Disabling SCMP_FLTATR_CTL_NNP flag for a root means that Seccomp 
> filter can be reverted anytime"`? How will the Seccomp filter be reverted? Do 
> you mean the task launched by Mesos can call libseccomp API to revert the 
> filter itself?
>     
>     If we have to live with this limitation (i.e., cannot run program which 
> has set-user-ID bit), then we need to highlight it in the document.
> 
> Gilbert Song wrote:
>     Seems like we asked the same question.
>     
>     Andrei, let align on this thread? :/thanks:)

>does that mean any containers launched by Docker daemon cannot run program 
>which has set-user-ID bit?

Docker daemon can not be used to run arbitrary programs (in opposity to Mesos 
c'zer). So, when one launches a Docker container, Docker daemon launches a 
container process with `NNP` bit set, which means that a container process (and 
it descendants) can't gain more previleges **outside** its container. Mesos 
containerizer has exactly the same behaviour:

1) Run system-provided `/bin/ping` (*outside* its container) as a 
non-privileged user:
```
$ ./src/mesos-execute --master="`hostname`:5050" --name="a" 
--containerizer=mesos --command="ping -c 3 8.8.8.8"
...
Received status update TASK_FAILED for task 'a'
  message: 'Command exited with status 2'
  source: SOURCE_EXECUTOR
```

2) Run system-provided `/bin/ping` (*outside* its container) as a privileged 
user:
```
sudo ./src/mesos-execute --master="`hostname`:5050" --name="a" 
--containerizer=mesos --command="ping -c 3 8.8.8.8"
...
Received status update TASK_FINISHED for task 'a'
  message: 'Command exited with status 0'
  source: SOURCE_EXECUTOR
```

3) Run container image provided `ping` (*inside* its image/container) as a 
non-privileged user:
```
$ ./src/mesos-execute --master="`hostname`:5050" --name="a" 
--containerizer=mesos --docker_image="fedora:latest" --command="yum -y install 
iputils;ping -c 3 8.8.8.8"
...
Received status update TASK_FINISHED for task 'a'
  message: 'Command exited with status 0'
  source: SOURCE_EXECUTOR

$ cat /path/to/container/stdout
...
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=122 time=13.9 ms
```

> This seems unfortunate since it might break some use cases or applications 
> that we already supported.

It's very unlikely that the agent launches tasks, whose binary has 
`setuid`/`setgid` bit specified. Because... what the point?
I doubt if any of the following programs a launched as a Mesos container:
```
$ sudo find /bin/ -perm -u=s -type f 2>/dev/null
/bin/newgrp
/bin/pkexec
/bin/mount
/bin/umount
/bin/newuidmap
/bin/newgidmap
/bin/sudo
/bin/crontab
/bin/su
/bin/gpasswd
/bin/chage
/bin/passwd
/bin/staprun
/bin/fusermount
/bin/fusermount-glusterfs
/bin/chfn
/bin/chsh
/bin/at
```

> And can you please elaborate a bit about "Disabling SCMP_FLTATR_CTL_NNP flag 
> for a root means that Seccomp filter can be reverted anytime"? How will the 
> Seccomp filter be reverted? Do you mean the task launched by Mesos can call 
> libseccomp API to revert the filter itself?

Yes, without `NNP` (`no_new_privs`) bit set, a privileged task might call 
`seccomp` Linux syscall to install an empty Seccomp filter.


- Andrei


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68018/#review211946
-----------------------------------------------------------


On Nov. 8, 2018, 3:24 p.m., Andrei Budnik wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68018/
> -----------------------------------------------------------
> 
> (Updated Nov. 8, 2018, 3:24 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, James Peach, and Qian Zhang.
> 
> 
> Bugs: MESOS-9034
>     https://issues.apache.org/jira/browse/MESOS-9034
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> `SeccompFilter` class is a wrapper for `libseccomp` API. Its main
> purpose is to provide a translation of the `ContainerSeccompProfile`
> message into calls of `libseccomp` API.
> 
> 
> Diffs
> -----
> 
>   src/CMakeLists.txt a574d449dc26b820cbef7ff0b5e94b42b6fe86cf 
>   src/Makefile.am cd785255fcdf1302a8f9fa358039e5d1f200e132 
>   src/linux/seccomp/seccomp.hpp PRE-CREATION 
>   src/linux/seccomp/seccomp.cpp PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68018/diff/15/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andrei Budnik
> 
>

Reply via email to