+1 to this idea.

It's a good stop gap solution while we explore a better options as well as
explore possible corner cases the change to 750 brings.

I know that you're busy, so thanks for looking into this as well!

-Renan

On Mon, Oct 22, 2018 at 9:26 PM Stephan Erb <stephan....@blue-yonder.com>
wrote:

> Hi Renan,
>
> Unfortunately, it might even be a bit more complicated: The executor is
> normally launched as root and then drops the privileges for each Thermos
> process once it got forked successfully. If the Mesos filesystem
> permissions are too narrow, then subsequent operations managed by those
> processes will fail. Most notably, the executor will crash whenever it
> tries to rotate log files. At least this is the behavior of the Mesos
> containerize before the fix you have referenced.
>
> In the Docker case, the executor always runs as root. However, there might
> even be other similar issues that only show up for long running containers.
> I therefore see the broken SSH as a symptom of an underlying issue that we
> need to address.
>
> Given that this is currently blocking our progress: Should we consider a
> chmod in
> https://github.com/apache/aurora/blob/32776792d273b36afbf4a1bab69a66fb06163ffd/src/main/python/apache/aurora/executor/common/sandbox.py#L173
> to restore the previous umask of 755 for the sandbox directory?
>
> Best regards,
> Stephan
>
> On 16.10.18, 03:47, "Renan DelValle" <re...@apache.org> wrote:
>
>     All,
>
>     As you may know Mesos has changed the default permissions for the
> sandbox
>     from 755 (-rwxr-xr-x) to 750 (-rwxr-x---) (
>     https://issues.apache.org/jira/browse/MESOS-8332).
>
>     Stephan Erb fixed most of the breakage caused by this change with his
>     recent patch
>
> https://github.com/apache/aurora/commit/32776792d273b36afbf4a1bab69a66fb06163ffd
>
>     Unfortunately, when it comes to docker based containers, the issue is
> a bit
>     more complicated.
>
>     Stephan and I have both looked into this and have been posting our
> findings
>     here:
>     https://github.com/apache/aurora/pull/42
>
>     Unfortunately, and I speak for myself here, I don't think there is an
> easy
>     way to keep our promise to allow users to aurora task ssh into the
> sandbox
>     of a docker container based task.
>
>     Problem:
>
>     When a docker container is launched, it is launched in its own
> namespace
>     and every command is run as root (uid=0) by default. This means two
> things:
>
>     A) None of the users of the host exist inside the container and
> therefore
>     we don't know the uid of the role inside the job key.
>
>     B) The sandbox for the dockerized task are owned by uid=0 and gid=0 on
> both
>     the container and the host.
>
>     Before Mesos 1.6, the permissions were open enough to allow aurora
> task ssh
>     to see the sandbox of a docker based task on the host.
>
>     From Mesos 1.6 on, aurora task ssh will not be able to see anything
> inside
>     of the sandbox of a docker based task since by default it is run under
>     user=role.
>
>     tl;dr: default aurora task ssh lacks the permissions to see docker
>     container based thermos sandboxes.
>
>     Solutions:
>
>     1. Find a way to mirror host users in container. (Not partial to this
> as it
>     adds a lot of complexity)
>
>     2. Allow users to provide images with uids that match the local boxes.
>     (Messy and error prone)
>
>     4. Leave as is (broken aurora task ssh for docker container based
> thermos
>     sandboxes) and leave it to operators to provide access to these
>     sandboxes. Users
>     should still be able to see these files in the sandbox through the
> Aurora
>     observer UI and Mesos UI (Sane but potentially burdensome on
> operators).
>
>     I'd love to hear other solutions if anyone else has thought of this
> problem.
>
>     -Renan
>
>
>

Reply via email to