Hi Renan, Unfortunately, it might even be a bit more complicated: The executor is normally launched as root and then drops the privileges for each Thermos process once it got forked successfully. If the Mesos filesystem permissions are too narrow, then subsequent operations managed by those processes will fail. Most notably, the executor will crash whenever it tries to rotate log files. At least this is the behavior of the Mesos containerize before the fix you have referenced.
In the Docker case, the executor always runs as root. However, there might even be other similar issues that only show up for long running containers. I therefore see the broken SSH as a symptom of an underlying issue that we need to address. Given that this is currently blocking our progress: Should we consider a chmod in https://github.com/apache/aurora/blob/32776792d273b36afbf4a1bab69a66fb06163ffd/src/main/python/apache/aurora/executor/common/sandbox.py#L173 to restore the previous umask of 755 for the sandbox directory? Best regards, Stephan On 16.10.18, 03:47, "Renan DelValle" <re...@apache.org> wrote: All, As you may know Mesos has changed the default permissions for the sandbox from 755 (-rwxr-xr-x) to 750 (-rwxr-x---) ( https://issues.apache.org/jira/browse/MESOS-8332). Stephan Erb fixed most of the breakage caused by this change with his recent patch https://github.com/apache/aurora/commit/32776792d273b36afbf4a1bab69a66fb06163ffd Unfortunately, when it comes to docker based containers, the issue is a bit more complicated. Stephan and I have both looked into this and have been posting our findings here: https://github.com/apache/aurora/pull/42 Unfortunately, and I speak for myself here, I don't think there is an easy way to keep our promise to allow users to aurora task ssh into the sandbox of a docker container based task. Problem: When a docker container is launched, it is launched in its own namespace and every command is run as root (uid=0) by default. This means two things: A) None of the users of the host exist inside the container and therefore we don't know the uid of the role inside the job key. B) The sandbox for the dockerized task are owned by uid=0 and gid=0 on both the container and the host. Before Mesos 1.6, the permissions were open enough to allow aurora task ssh to see the sandbox of a docker based task on the host. From Mesos 1.6 on, aurora task ssh will not be able to see anything inside of the sandbox of a docker based task since by default it is run under user=role. tl;dr: default aurora task ssh lacks the permissions to see docker container based thermos sandboxes. Solutions: 1. Find a way to mirror host users in container. (Not partial to this as it adds a lot of complexity) 2. Allow users to provide images with uids that match the local boxes. (Messy and error prone) 4. Leave as is (broken aurora task ssh for docker container based thermos sandboxes) and leave it to operators to provide access to these sandboxes. Users should still be able to see these files in the sandbox through the Aurora observer UI and Mesos UI (Sane but potentially burdensome on operators). I'd love to hear other solutions if anyone else has thought of this problem. -Renan