[ https://issues.apache.org/jira/browse/MESOS-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph Wu updated MESOS-5748: ----------------------------- Fix Version/s: 0.27.4 0.28.3 > Potential segfault in `link` and `send` when linking to a remote process > ------------------------------------------------------------------------ > > Key: MESOS-5748 > URL: https://issues.apache.org/jira/browse/MESOS-5748 > Project: Mesos > Issue Type: Bug > Components: libprocess > Affects Versions: 0.22.0, 0.23.0, 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.0 > Reporter: Joseph Wu > Assignee: Joseph Wu > Labels: libprocess, mesosphere > Fix For: 0.28.3, 1.0.0, 0.27.4 > > > There is a race in the SocketManager, between a remote {{link}} and > disconnection of the underlying socket. > We potentially segfault here: > https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1512 > {{\*socket}} dereferences the shared pointer underpinning the {{Socket*}} > object. However, the code above this line actually has ownership of the > pointer: > https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1494-L1499 > If the socket dies during the link, the {{ignore_recv_data}} may delete the > Socket underneath {{link}}: > https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1399-L1411 > ---- > The same race exists for {{send}}. > This race was discovered while running a new test in repetition: > https://reviews.apache.org/r/49175/ > On OSX, I hit the race consistently every 500-800 repetitions: > {code} > 3rdparty/libprocess/libprocess-tests > --gtest_filter="ProcessRemoteLinkTest.RemoteLink" --gtest_break_on_failure > --gtest_repeat=1000 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)