> On March 15, 2018, 6:30 p.m., Zhitao Li wrote: > > I feel that the complexity of this code justifies better user doc, possibly > > when we create a new isolator for this? > > > > Also, how much of each mount should be allow to reconfigure? Should this > > behavior be dictated for every user of Mesos containerizer?
Totally agreed that we should move the above mount points to different isolators and it's part of my plan about the patches for [MESOS-6798](https://issues.apache.org/jira/browse/MESOS-6798). It would make more sense if the mount points under `/proc` and `/sys` (as well as some of `/dev`) are moved to `filesystem/linux`. As for whether these extra mount points should be applied to each and every Mesos containers, my answer is no. But they should definitely be applied to most of Mesos containers for security purpose, as they are usually application containers. That said, for more privileged containers, they should not be mandated. We could consider adding a few knobs to different levels to allow users to tweak the behavior. For example, an extra agent flag can be added, so we can have the agent level default of container security settings. And further more we could also consider adding an extra field like `privileged` or something else (similar to Docker's `--privileged` flag), or have something finer-grained like negated versions of `Protect*` directives in [Systemd's sandboxing configurations](https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Sandboxing), to `LinuxInfo`, if people need control security settings of Mesos containers. I'll put the comments on the tasks themselves, so we can track this better. > On March 15, 2018, 6:30 p.m., Zhitao Li wrote: > > src/linux/fs.cpp > > Lines 686-692 (original), 686-692 (patched) > > <https://reviews.apache.org/r/66034/diff/1/?file=1974223#file1974223line686> > > > > Can we move the `TODO` to the sentence about follow-up? The sentence > > `These special filesystem mount points need to be bind-mounted prior to all > > other ...` is a comment on requirement which your follow up work would not > > change. Makes sense. It's worth nothing, though, as I said in the other comment in this thread, the list will eventually be moved away from this file, as I polish up the mounts with other isolators. - Jason ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66034/#review199281 ----------------------------------------------------------- On March 15, 2018, 6:24 p.m., Jason Lai wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66034/ > ----------------------------------------------------------- > > (Updated March 15, 2018, 6:24 p.m.) > > > Review request for mesos, Eric Chung, Gilbert Song, Ian Downes, Jie Yu, James > Peach, and Zhitao Li. > > > Bugs: MESOS-8654 > https://issues.apache.org/jira/browse/MESOS-8654 > > > Repository: mesos > > > Description > ------- > > Several entries under the proc FS within Mesos containers need to be > remounted as readonly for improved security reasons. > > The list should include the important ones introduced by Systemd's > `ProtectKernelTunables` option: > > * `/proc/bus` > * `/proc/fs` > * `/proc/irq` > * `/proc/sys` > * `/proc/sysrq-trigger` > > It is particularly necessary to remount `/proc/sysrq-trigger` as > read-only. Otherwise, it would be possible for processes running in > containers as `root` to perform privileged operations, such as host > reboot. > > Extra mount options should include `nosuid,noexec,nodev` (see also > `mount(2)` for detailed explanations of the options). > > > Diffs > ----- > > src/linux/fs.cpp ed26f80ef7315809a1df9f2c50b4fe3445810f8a > > > Diff: https://reviews.apache.org/r/66034/diff/1/ > > > Testing > ------- > > The mount table of the container launched by the patched version of > `mesos-containerizer launch` include the entries listed below, with > `nosuid,noexec,nodev` mount options: > ``` > $ sudo unshare -m -p -f /usr/local/libexec/mesos/mesos-containerizer launch > --launch_info="$(jq -c . launch_info.json)" --runtime_directory="$(pwd)" > Marked '/' as rslave > Prepared mount > '{"flags":20480,"source":"\/etc\/hostname","target":"\/home\/jlai\/containers\/rootfs\/etc\/hostname"}' > Prepared mount > '{"flags":20480,"source":"\/etc\/hosts","target":"\/home\/jlai\/containers\/rootfs\/etc\/hosts"}' > Prepared mount > '{"flags":20480,"source":"\/etc\/resolv.conf","target":"\/home\/jlai\/containers\/rootfs\/etc\/resolv.conf"}' > Changing root to /home/jlai/containers/rootfs > bash-4.4# findmnt -a > TARGET SOURCE FSTYPE OPTIONS > / alpine overlay > rw,relatime,lowerdir=overlay/lower,upperdir=overlay/upper,workdir=overlay/work > |-/etc/hostname /dev/dm-0[/etc/hostname] ext4 > rw,noatime,errors=panic,data=ordered > |-/etc/hosts /dev/dm-0[/etc/hosts] ext4 > rw,noatime,errors=panic,data=ordered > |-/etc/resolv.conf /dev/dm-0[/etc/resolv.conf] ext4 > rw,noatime,errors=panic,data=ordered > |-/proc proc proc > rw,nosuid,nodev,noexec,relatime > | |-/proc/bus proc[/bus] proc > ro,nosuid,nodev,noexec,relatime > | |-/proc/fs proc[/fs] proc > ro,nosuid,nodev,noexec,relatime > | |-/proc/irq proc[/irq] proc > ro,nosuid,nodev,noexec,relatime > | |-/proc/sys proc[/sys] proc > ro,nosuid,nodev,noexec,relatime > | `-/proc/sysrq-trigger proc[/sysrq-trigger] proc > ro,nosuid,nodev,noexec,relatime > |-/sys sysfs sysfs > ro,nosuid,nodev,noexec,relatime > `-/dev tmpfs tmpfs > rw,nosuid,noexec,mode=755 > |-/dev/pts devpts devpts > rw,nosuid,noexec,relatime,mode=600,ptmxmode=666 > `-/dev/shm tmpfs tmpfs rw,nosuid,nodev > ``` > > > Thanks, > > Jason Lai > >