[ 
https://issues.apache.org/jira/browse/MESOS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-5723:
-----------------------------
    Fix Version/s: 0.27.4
                   0.28.3

> SSL-enabled libprocess will leak incoming links to forks
> --------------------------------------------------------
>
>                 Key: MESOS-5723
>                 URL: https://issues.apache.org/jira/browse/MESOS-5723
>             Project: Mesos
>          Issue Type: Bug
>          Components: libprocess
>    Affects Versions: 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.0
>            Reporter: Joseph Wu
>            Assignee: Joseph Wu
>            Priority: Blocker
>              Labels: libprocess, mesosphere, ssl
>             Fix For: 0.28.3, 1.0.0, 0.27.4
>
>
> Encountered two different buggy behaviors that can be tracked down to the 
> same underlying problem.
> Repro #1 (non-crashy):
> (1) Start a master.  Doesn't matter if SSL is enabled or not.
> (2) Start an agent, with SSL enabled.  Downgrade support has the same 
> problem.  The master/agent {{link}} to one another.
> (3) Run a sleep task.  Keep this alive.  If you inspect FDs at this point, 
> you'll notice the task has inherited the {{link}} FD (master -> agent).
> (4) Restart the agent.  Due to (3), the master's {{link}} stays open.
> (5) Check master's logs for the agent's re-registration message.
> (6) Check the agent's logs for re-registration.  The message will not appear. 
>  The master is actually using the old {{link}} which is not connected to the 
> agent.
> ----
> Repro #2 (crashy):
> (1) Start a master.  Doesn't matter if SSL is enabled or not.
> (2) Start an agent, with SSL enabled.  Downgrade support has the same problem.
> (3) Run ~100 sleep task one after the other, keep them all alive.  Each task 
> links back to the agent.  Due to an FD leak, each task will inherit the 
> incoming links from all other actors...
> (4) At some point, the agent will run out of FDs and kernel panic.
> ----
> It appears that the SSL socket {{accept}} call is missing {{os::nonblock}} 
> and {{os::cloexec}} calls:
> https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L794-L806
> For reference, here's {{poll}} socket's {{accept}}:
> https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/poll_socket.cpp#L53-L75



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to