Hi Renan,

Unfortunately, it might even be a bit more complicated: The executor is 
normally launched as root and then drops the privileges for each Thermos 
process once it got forked successfully. If the Mesos filesystem permissions 
are too narrow, then subsequent operations managed by those processes will 
fail. Most notably, the executor will crash whenever it tries to rotate log 
files. At least this is the behavior of the Mesos containerize before the fix 
you have referenced.

In the Docker case, the executor always runs as root. However, there might even 
be other similar issues that only show up for long running containers. I 
therefore see the broken SSH as a symptom of an underlying issue that we need 
to address. 

Given that this is currently blocking our progress: Should we consider a chmod 
in 
https://github.com/apache/aurora/blob/32776792d273b36afbf4a1bab69a66fb06163ffd/src/main/python/apache/aurora/executor/common/sandbox.py#L173
 to restore the previous umask of 755 for the sandbox directory? 

Best regards,
Stephan

On 16.10.18, 03:47, "Renan DelValle" <re...@apache.org> wrote:

    All,
    
    As you may know Mesos has changed the default permissions for the sandbox
    from 755 (-rwxr-xr-x) to 750 (-rwxr-x---) (
    https://issues.apache.org/jira/browse/MESOS-8332).
    
    Stephan Erb fixed most of the breakage caused by this change with his
    recent patch
    
https://github.com/apache/aurora/commit/32776792d273b36afbf4a1bab69a66fb06163ffd
    
    Unfortunately, when it comes to docker based containers, the issue is a bit
    more complicated.
    
    Stephan and I have both looked into this and have been posting our findings
    here:
    https://github.com/apache/aurora/pull/42
    
    Unfortunately, and I speak for myself here, I don't think there is an easy
    way to keep our promise to allow users to aurora task ssh into the sandbox
    of a docker container based task.
    
    Problem:
    
    When a docker container is launched, it is launched in its own namespace
    and every command is run as root (uid=0) by default. This means two things:
    
    A) None of the users of the host exist inside the container and therefore
    we don't know the uid of the role inside the job key.
    
    B) The sandbox for the dockerized task are owned by uid=0 and gid=0 on both
    the container and the host.
    
    Before Mesos 1.6, the permissions were open enough to allow aurora task ssh
    to see the sandbox of a docker based task on the host.
    
    From Mesos 1.6 on, aurora task ssh will not be able to see anything inside
    of the sandbox of a docker based task since by default it is run under
    user=role.
    
    tl;dr: default aurora task ssh lacks the permissions to see docker
    container based thermos sandboxes.
    
    Solutions:
    
    1. Find a way to mirror host users in container. (Not partial to this as it
    adds a lot of complexity)
    
    2. Allow users to provide images with uids that match the local boxes.
    (Messy and error prone)
    
    4. Leave as is (broken aurora task ssh for docker container based thermos
    sandboxes) and leave it to operators to provide access to these
    sandboxes. Users
    should still be able to see these files in the sandbox through the Aurora
    observer UI and Mesos UI (Sane but potentially burdensome on operators).
    
    I'd love to hear other solutions if anyone else has thought of this problem.
    
    -Renan
    

Reply via email to