> On Jan. 14, 2019, 8:31 a.m., Qian Zhang wrote: > > src/linux/seccomp/seccomp.cpp > > Lines 137-139 (patched) > > <https://reviews.apache.org/r/68018/diff/14/?file=2117423#file2117423line137> > > > > Will this affect the task run by Mesos? E.g., a task may want to run a > > program which has `set-user-ID` bit. > > Andrei Budnik wrote: > Yes, `no_new_privs` flag affects the task that wants to run a program > which has `set-user-ID` bit. > E.g., launching a `ping -c 3 8.8.8.8` fails with seccomp. You'll see a > message in executor logs: > ``` > I0114 07:19:21.887670 13264 executor.cpp:706] Forked command at 13276 > ping: socket: Operation not permitted > I0114 07:19:22.055352 13263 executor.cpp:1007] Command exited with status > 2 (pid: 13276) > ``` > > Also, see my previous comment > https://reviews.apache.org/r/68018/#comment297000 > > Qian Zhang wrote: > In your previous comment, you mentioned that Docker daemon launches its > containers with `SCMP_FLTATR_CTL_NNP` flag set by default, does that mean any > containers launched by Docker daemon cannot run program which has set-user-ID > bit? > > This seems unfortunate since it might break some use cases or > applications that we already supported. And can you please elaborate a bit > about `"Disabling SCMP_FLTATR_CTL_NNP flag for a root means that Seccomp > filter can be reverted anytime"`? How will the Seccomp filter be reverted? Do > you mean the task launched by Mesos can call libseccomp API to revert the > filter itself? > > If we have to live with this limitation (i.e., cannot run program which > has set-user-ID bit), then we need to highlight it in the document. > > Gilbert Song wrote: > Seems like we asked the same question. > > Andrei, let align on this thread? :/thanks:)
>does that mean any containers launched by Docker daemon cannot run program >which has set-user-ID bit? Docker daemon can not be used to run arbitrary programs (in opposity to Mesos c'zer). So, when one launches a Docker container, Docker daemon launches a container process with `NNP` bit set, which means that a container process (and it descendants) can't gain more previleges **outside** its container. Mesos containerizer has exactly the same behaviour: 1) Run system-provided `/bin/ping` (*outside* its container) as a non-privileged user: ``` $ ./src/mesos-execute --master="`hostname`:5050" --name="a" --containerizer=mesos --command="ping -c 3 8.8.8.8" ... Received status update TASK_FAILED for task 'a' message: 'Command exited with status 2' source: SOURCE_EXECUTOR ``` 2) Run system-provided `/bin/ping` (*outside* its container) as a privileged user: ``` sudo ./src/mesos-execute --master="`hostname`:5050" --name="a" --containerizer=mesos --command="ping -c 3 8.8.8.8" ... Received status update TASK_FINISHED for task 'a' message: 'Command exited with status 0' source: SOURCE_EXECUTOR ``` 3) Run container image provided `ping` (*inside* its image/container) as a non-privileged user: ``` $ ./src/mesos-execute --master="`hostname`:5050" --name="a" --containerizer=mesos --docker_image="fedora:latest" --command="yum -y install iputils;ping -c 3 8.8.8.8" ... Received status update TASK_FINISHED for task 'a' message: 'Command exited with status 0' source: SOURCE_EXECUTOR $ cat /path/to/container/stdout ... PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=122 time=13.9 ms ``` > This seems unfortunate since it might break some use cases or applications > that we already supported. It's very unlikely that the agent launches tasks, whose binary has `setuid`/`setgid` bit specified. Because... what the point? I doubt if any of the following programs a launched as a Mesos container: ``` $ sudo find /bin/ -perm -u=s -type f 2>/dev/null /bin/newgrp /bin/pkexec /bin/mount /bin/umount /bin/newuidmap /bin/newgidmap /bin/sudo /bin/crontab /bin/su /bin/gpasswd /bin/chage /bin/passwd /bin/staprun /bin/fusermount /bin/fusermount-glusterfs /bin/chfn /bin/chsh /bin/at ``` > And can you please elaborate a bit about "Disabling SCMP_FLTATR_CTL_NNP flag > for a root means that Seccomp filter can be reverted anytime"? How will the > Seccomp filter be reverted? Do you mean the task launched by Mesos can call > libseccomp API to revert the filter itself? Yes, without `NNP` (`no_new_privs`) bit set, a privileged task might call `seccomp` Linux syscall to install an empty Seccomp filter. - Andrei ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68018/#review211946 ----------------------------------------------------------- On Nov. 8, 2018, 3:24 p.m., Andrei Budnik wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68018/ > ----------------------------------------------------------- > > (Updated Nov. 8, 2018, 3:24 p.m.) > > > Review request for mesos, Gilbert Song, Jie Yu, James Peach, and Qian Zhang. > > > Bugs: MESOS-9034 > https://issues.apache.org/jira/browse/MESOS-9034 > > > Repository: mesos > > > Description > ------- > > `SeccompFilter` class is a wrapper for `libseccomp` API. Its main > purpose is to provide a translation of the `ContainerSeccompProfile` > message into calls of `libseccomp` API. > > > Diffs > ----- > > src/CMakeLists.txt a574d449dc26b820cbef7ff0b5e94b42b6fe86cf > src/Makefile.am cd785255fcdf1302a8f9fa358039e5d1f200e132 > src/linux/seccomp/seccomp.hpp PRE-CREATION > src/linux/seccomp/seccomp.cpp PRE-CREATION > > > Diff: https://reviews.apache.org/r/68018/diff/15/ > > > Testing > ------- > > > Thanks, > > Andrei Budnik > >