Joseph Wu created MESOS-5748: -------------------------------- Summary: Potential segfault in `link` and `send` when linking to a remote process Key: MESOS-5748 URL: https://issues.apache.org/jira/browse/MESOS-5748 Project: Mesos Issue Type: Bug Components: libprocess Affects Versions: 0.28.0, 0.27.0, 0.26.0, 0.25.0, 0.24.0, 0.23.0, 0.22.0 Reporter: Joseph Wu Fix For: 1.0.0
There is a race the SocketManager, between a remote {{link}} and disconnection of the underlying socket. We potentially segfault here: https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1512 {{\*socket}} dereferences the shared pointer underpinning the {{Socket*}} object. However, the code above this line actually has ownership of the pointer: https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1494-L1499 If the socket dies during the link, the {{ignore_recv_data}} may delete the Socket underneath {{link}}: https://github.com/apache/mesos/blob/215e79f571a989e998488077d713c28c7528926e/3rdparty/libprocess/src/process.cpp#L1399-L1411 ---- The same race exists for {{send}}. This race was discovered while running a new test in repetition: https://reviews.apache.org/r/49175/ On OSX, I hit the race consistently every 500-800 repetitions: {code} 3rdparty/libprocess/libprocess-tests --gtest_filter="ProcessRemoteLinkTest.RemoteLink" --gtest_break_on_failure --gtest_repeat=1000 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)