[ 
https://issues.apache.org/jira/browse/MESOS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-5748:
--------------------------------

    Assignee: Joseph Wu

> Potential segfault in `link` and `send` when linking to a remote process
> ------------------------------------------------------------------------
>
>                 Key: MESOS-5748
>                 URL: https://issues.apache.org/jira/browse/MESOS-5748
>             Project: Mesos
>          Issue Type: Bug
>          Components: libprocess
>    Affects Versions: 0.22.0, 0.23.0, 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.0
>            Reporter: Joseph Wu
>            Assignee: Joseph Wu
>              Labels: libprocess, mesosphere
>             Fix For: 1.0.0
>
>
> There is a race in the SocketManager, between a remote {{link}} and 
> disconnection of the underlying socket.
> We potentially segfault here: 
> https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1512
> {{\*socket}} dereferences the shared pointer underpinning the {{Socket*}} 
> object.  However, the code above this line actually has ownership of the 
> pointer:
> https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1494-L1499
> If the socket dies during the link, the {{ignore_recv_data}} may delete the 
> Socket underneath {{link}}:
> https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1399-L1411
> ----
> The same race exists for {{send}}.
> This race was discovered while running a new test in repetition:
> https://reviews.apache.org/r/49175/
> On OSX, I hit the race consistently every 500-800 repetitions:
> {code}
> 3rdparty/libprocess/libprocess-tests 
> --gtest_filter="ProcessRemoteLinkTest.RemoteLink"  --gtest_break_on_failure 
> --gtest_repeat=1000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to