[ 
https://issues.apache.org/jira/browse/MESOS-6759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746431#comment-15746431
 ] 

Jie Yu edited comment on MESOS-6759 at 12/13/16 10:11 PM:
----------------------------------------------------------

OK, found more clue now. Looks like the listening socket gets closed after the 
first test run and got reused in the second test as the listening socket. 
'accept' in the first test run is not discarded (still polling the listening 
socket)
{noformat}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from IOSwitchboardServerTest
[ RUN      ] IOSwitchboardServerTest.AttachOutput
[pid 45388] close(7)                    = 0
[pid 45388] close(8)                    = 0
[pid 45388] bind(9, {sa_family=AF_LOCAL, 
sun_path="/tmp/9OMQri/mesos-io-switchboard"}, 110) = 0
[pid 45388] close(10)                   = 0
[pid 45388] connect(10, {sa_family=AF_LOCAL, 
sun_path="/tmp/9OMQri/mesos-io-switchboard"}, 110) = 0
[pid 45453] accept(9, {sa_family=AF_LOCAL, NULL}, [2]) = 11
...
[pid 45388] close(9)                    = 0
...
[       OK ] IOSwitchboardServerTest.AttachOutput (3898 ms)
[----------] 1 test from IOSwitchboardServerTest (3898 ms total)
...
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from IOSwitchboardServerTest
[ RUN      ] IOSwitchboardServerTest.AttachOutput
[pid 45388] close(7)                    = 0
[pid 45388] close(8)                    = 0
[pid 45388] bind(9, {sa_family=AF_LOCAL, 
sun_path="/tmp/3P0j2A/mesos-io-switchboard"}, 110) = 0
[pid 45388] connect(10, {sa_family=AF_LOCAL, 
sun_path="/tmp/3P0j2A/mesos-io-switchboard"}, 110) = 0
[pid 45453] accept(9, {sa_family=AF_LOCAL, NULL}, [2]) = 11
[pid 45453] close(11)                   = 0
[pid 45453] accept(9, 0x7fb700cc51d0, [128]) = -1 EAGAIN (Resource temporarily 
unavailable)
/home/jie/workspace/mesos/src/tests/containerizer/io_switchboard_tests.cpp:271: 
Failure
(response).failure(): Disconnected
/home/jie/workspace/mesos/src/tests/containerizer/io_switchboard_tests.cpp:272: 
Failure
(response).failure(): Disconnected
F1213 14:06:02.095942 45388 future.hpp:1137] Check failed: !isFailed() 
Future::get() but state == FAILED: Disconnected
{noformat}


was (Author: jieyu):
OK, found more clue now. Looks like the listening socket gets closed after the 
first test run and got reused in the second test as the listening socket. 
'accept' in the first test run is not discarded (still polling the listening 
socket)
{noformat}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from IOSwitchboardServerTest
[ RUN      ] IOSwitchboardServerTest.AttachOutput
[pid 45388] close(7)                    = 0
[pid 45388] close(8)                    = 0
[pid 45388] bind(9, {sa_family=AF_LOCAL, 
sun_path="/tmp/9OMQri/mesos-io-switchboard"}, 110) = 0
[pid 45388] close(10)                   = 0
[pid 45388] connect(10, {sa_family=AF_LOCAL, 
sun_path="/tmp/9OMQri/mesos-io-switchboard"}, 110) = 0
[pid 45453] accept(9, {sa_family=AF_LOCAL, NULL}, [2]) = 11
...
[pid 45388] close(9)                    = 0
...
[       OK ] IOSwitchboardServerTest.AttachOutput (3898 ms)
[----------] 1 test from IOSwitchboardServerTest (3898 ms total)
...
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from IOSwitchboardServerTest
[ RUN      ] IOSwitchboardServerTest.AttachOutput
[pid 45388] close(7)                    = 0
[pid 45388] close(8)                    = 0
[pid 45388] bind(9, {sa_family=AF_LOCAL, 
sun_path="/tmp/3P0j2A/mesos-io-switchboard"}, 110) = 0
[pid 45388] connect(10, {sa_family=AF_LOCAL, 
sun_path="/tmp/3P0j2A/mesos-io-switchboard"}, 110) = 0
[pid 45453] accept(9, {sa_family=AF_LOCAL, NULL}, [2]) = 11
[pid 45453] close(11)                   = 0
[pid 45453] accept(9, 0x7fb700cc51d0, [128]) = -1 EAGAIN (Resource temporarily 
unavailable)
/home/jie/workspace/mesos/src/tests/containerizer/io_switchboard_tests.cpp:271: 
Failure
(response).failure(): Disconnected
/home/jie/workspace/mesos/src/tests/containerizer/io_switchboard_tests.cpp:272: 
Failure
(response).failure(): Disconnected
F1213 14:06:02.095942 45388 future.hpp:1137] Check failed: !isFailed() 
Future::get() but state == FAILED: Disconnected

> IOSwitchboardServerTest.AttachOutput has CHECK failure if run it multiple 
> times.
> --------------------------------------------------------------------------------
>
>                 Key: MESOS-6759
>                 URL: https://issues.apache.org/jira/browse/MESOS-6759
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Jie Yu
>
> I can easily repo this issue on my dev centos7 box with the following command:
> {noformat}
> GLOG_v=1 bin/mesos-tests.sh 
> --gtest_filter=IOSwitchboardServerTest.AttachOutput --verbose --gtest_repeat=2
> ....
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from IOSwitchboardServerTest
> [ RUN      ] IOSwitchboardServerTest.AttachOutput
> I1208 10:46:31.574084 41813 poll_socket.cpp:209] Socket error while sending: 
> Broken pipe
> /home/jie/workspace/mesos/src/tests/containerizer/io_switchboard_tests.cpp:265:
>  Failure
> (response).failure(): Disconnected
> /home/jie/workspace/mesos/src/tests/containerizer/io_switchboard_tests.cpp:266:
>  Failure
> (response).failure(): Disconnected
> F1208 10:46:31.574919 41751 future.hpp:1137] Check failed: !isFailed() 
> Future::get() but state == FAILED: Disconnected
> *** Check failure stack trace: ***
>     @     0x7fc3f35a633a  google::LogMessage::Fail()
>     @     0x7fc3f35a6299  google::LogMessage::SendToLog()
>     @     0x7fc3f35a5caa  google::LogMessage::Flush()
>     @     0x7fc3f35a89de  google::LogMessageFatal::~LogMessageFatal()
>     @           0xb6a352  process::Future<>::get()
>     @          0x1a050fe  
> mesos::internal::tests::IOSwitchboardServerTest_AttachOutput_Test::TestBody()
>     @          0x1c54ce2  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
>     @          0x1c4fe00  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @          0x1c31491  testing::Test::Run()
>     @          0x1c31c14  testing::TestInfo::Run()
>     @          0x1c3225a  testing::TestCase::Run()
>     @          0x1c38b34  testing::internal::UnitTestImpl::RunAllTests()
>     @          0x1c55907  
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
>     @          0x1c50948  
> testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @          0x1c3787a  testing::UnitTest::Run()
>     @          0x11cc653  RUN_ALL_TESTS()
>     @          0x11cc209  main
>     @     0x7fc3ecb61b15  __libc_start_main
>     @           0xab5e89  (unknown)
> Aborted (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to