+1 to this idea. It's a good stop gap solution while we explore a better options as well as explore possible corner cases the change to 750 brings.
I know that you're busy, so thanks for looking into this as well! -Renan On Mon, Oct 22, 2018 at 9:26 PM Stephan Erb <stephan....@blue-yonder.com> wrote: > Hi Renan, > > Unfortunately, it might even be a bit more complicated: The executor is > normally launched as root and then drops the privileges for each Thermos > process once it got forked successfully. If the Mesos filesystem > permissions are too narrow, then subsequent operations managed by those > processes will fail. Most notably, the executor will crash whenever it > tries to rotate log files. At least this is the behavior of the Mesos > containerize before the fix you have referenced. > > In the Docker case, the executor always runs as root. However, there might > even be other similar issues that only show up for long running containers. > I therefore see the broken SSH as a symptom of an underlying issue that we > need to address. > > Given that this is currently blocking our progress: Should we consider a > chmod in > https://github.com/apache/aurora/blob/32776792d273b36afbf4a1bab69a66fb06163ffd/src/main/python/apache/aurora/executor/common/sandbox.py#L173 > to restore the previous umask of 755 for the sandbox directory? > > Best regards, > Stephan > > On 16.10.18, 03:47, "Renan DelValle" <re...@apache.org> wrote: > > All, > > As you may know Mesos has changed the default permissions for the > sandbox > from 755 (-rwxr-xr-x) to 750 (-rwxr-x---) ( > https://issues.apache.org/jira/browse/MESOS-8332). > > Stephan Erb fixed most of the breakage caused by this change with his > recent patch > > https://github.com/apache/aurora/commit/32776792d273b36afbf4a1bab69a66fb06163ffd > > Unfortunately, when it comes to docker based containers, the issue is > a bit > more complicated. > > Stephan and I have both looked into this and have been posting our > findings > here: > https://github.com/apache/aurora/pull/42 > > Unfortunately, and I speak for myself here, I don't think there is an > easy > way to keep our promise to allow users to aurora task ssh into the > sandbox > of a docker container based task. > > Problem: > > When a docker container is launched, it is launched in its own > namespace > and every command is run as root (uid=0) by default. This means two > things: > > A) None of the users of the host exist inside the container and > therefore > we don't know the uid of the role inside the job key. > > B) The sandbox for the dockerized task are owned by uid=0 and gid=0 on > both > the container and the host. > > Before Mesos 1.6, the permissions were open enough to allow aurora > task ssh > to see the sandbox of a docker based task on the host. > > From Mesos 1.6 on, aurora task ssh will not be able to see anything > inside > of the sandbox of a docker based task since by default it is run under > user=role. > > tl;dr: default aurora task ssh lacks the permissions to see docker > container based thermos sandboxes. > > Solutions: > > 1. Find a way to mirror host users in container. (Not partial to this > as it > adds a lot of complexity) > > 2. Allow users to provide images with uids that match the local boxes. > (Messy and error prone) > > 4. Leave as is (broken aurora task ssh for docker container based > thermos > sandboxes) and leave it to operators to provide access to these > sandboxes. Users > should still be able to see these files in the sandbox through the > Aurora > observer UI and Mesos UI (Sane but potentially burdensome on > operators). > > I'd love to hear other solutions if anyone else has thought of this > problem. > > -Renan > > >