[jira] [Updated] (MESOS-5000) MasterTest.MasterLost is flaky

Benjamin Bannier (JIRA) Tue, 22 Mar 2016 08:09:47 -0700

     [ 
https://issues.apache.org/jira/browse/MESOS-5000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Benjamin Bannier updated MESOS-5000:
------------------------------------
    Description: 
The test {{MasterTest.MasterLost}} and 
{{ExceptionTest.DisallowSchedulerActionsOnAbort}} fail at least half the time 
under OS X (clang, not optimized, {{30efac7}}), e.g.,
{code}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from MasterTest
[ RUN      ] MasterTest.MasterLost
*** Aborted at 1458650698 (unix time) try "date -d @1458650698" if you are 
using GNU date ***
PC: @        0x109685fcc mesos::internal::state::State::store()
*** SIGSEGV (@0x0) received by PID 18620 (TID 0x111259000) stack trace: ***
    @     0x7fff850e1f1a _sigtramp
    @        0x108c74eaf boost::uuids::detail::sha1::process_byte_impl()
    @        0x1095fd723 mesos::internal::state::protobuf::State::store<>()
    @        0x1095fbd3e mesos::internal::master::RegistrarProcess::update()
    @        0x1095fcf6c mesos::internal::master::RegistrarProcess::_apply()
    @        0x1096797a0 
_ZZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS_5OwnedINS3_9OperationEEES7_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSC_FSA_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESL_
    @        0x1096795f0 
_ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS3_5OwnedINS7_9OperationEEESB_EENS3_6FutureIT_EERKNS3_3PIDIT0_EEMSG_FSE_T1_ET2_EUlPNS3_11ProcessBaseEE_SP_EEEvDpOT_
    @        0x1096792d9 
_ZNSt3__110__function6__funcIZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS2_5OwnedINS6_9OperationEEESA_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSF_FSD_T1_ET2_EUlPNS2_11ProcessBaseEE_NS_9allocatorISP_EEFvSO_EEclEOSO_
    @        0x10b2e9e4c std::__1::function<>::operator()()
    @        0x10b2e9d9c process::ProcessBase::visit()
    @        0x10b31d26e process::DispatchEvent::visit()
    @        0x108ad7d81 process::ProcessBase::serve()
    @        0x10b2e3cb4 process::ProcessManager::resume()
    @        0x10b36c479 
process::ProcessManager::init_threads()::$_1::operator()()
    @        0x10b36c0a2 
_ZNSt3__114__thread_proxyINS_5tupleIJNS_6__bindIZN7process14ProcessManager12init_threadsEvE3$_1JNS_17reference_wrapperIKNS_6atomicIbEEEEEEEEEEEEPvSD_
    @     0x7fff90eca05a _pthread_body
    @     0x7fff90ec9fd7 _pthread_start
    @     0x7fff90ec73ed thread_start
{code}

Sometimes also {{FaultToleranceTest.SchedulerFailover}} fails with the same 
stack trace.

I could trace this to the recent refactoring of the test helpers (MESOS-4633, 
MESOS-4634),
{code}
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
75ca1e6c9fde655c41fdf835aa20c47570d21f10
56e9406763e8514a7557ab3862d2f352a61425d5
b377557c2bfc35c894e87becb47122955540f133
7bf6e4f70131175edd4d6d77ea0dc7692b3e72ae
c7df1d7bcb1604c95800871cc0473c946e5b5d16
951539317525f3afe9490ed098617e5d4563a80a
We cannot bisect more!
{code}

It appears the lifetimes of some objects are still not ordered correctly.


  was:
The test {{MasterTest.MasterLost}} and 
{{ExceptionTest.DisallowSchedulerActionsOnAbort}} fail at least half the time 
under OS X (clang, not optimized, {{30efac7}}), e.g.,
{code}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from MasterTest
[ RUN      ] MasterTest.MasterLost
*** Aborted at 1458650698 (unix time) try "date -d @1458650698" if you are 
using GNU date ***
PC: @        0x109685fcc mesos::internal::state::State::store()
*** SIGSEGV (@0x0) received by PID 18620 (TID 0x111259000) stack trace: ***
    @     0x7fff850e1f1a _sigtramp
    @        0x108c74eaf boost::uuids::detail::sha1::process_byte_impl()
    @        0x1095fd723 mesos::internal::state::protobuf::State::store<>()
    @        0x1095fbd3e mesos::internal::master::RegistrarProcess::update()
    @        0x1095fcf6c mesos::internal::master::RegistrarProcess::_apply()
    @        0x1096797a0 
_ZZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS_5OwnedINS3_9OperationEEES7_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSC_FSA_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESL_
    @        0x1096795f0 
_ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS3_5OwnedINS7_9OperationEEESB_EENS3_6FutureIT_EERKNS3_3PIDIT0_EEMSG_FSE_T1_ET2_EUlPNS3_11ProcessBaseEE_SP_EEEvDpOT_
    @        0x1096792d9 
_ZNSt3__110__function6__funcIZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS2_5OwnedINS6_9OperationEEESA_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSF_FSD_T1_ET2_EUlPNS2_11ProcessBaseEE_NS_9allocatorISP_EEFvSO_EEclEOSO_
    @        0x10b2e9e4c std::__1::function<>::operator()()
    @        0x10b2e9d9c process::ProcessBase::visit()
    @        0x10b31d26e process::DispatchEvent::visit()
    @        0x108ad7d81 process::ProcessBase::serve()
    @        0x10b2e3cb4 process::ProcessManager::resume()
    @        0x10b36c479 
process::ProcessManager::init_threads()::$_1::operator()()
    @        0x10b36c0a2 
_ZNSt3__114__thread_proxyINS_5tupleIJNS_6__bindIZN7process14ProcessManager12init_threadsEvE3$_1JNS_17reference_wrapperIKNS_6atomicIbEEEEEEEEEEEEPvSD_
    @     0x7fff90eca05a _pthread_body
    @     0x7fff90ec9fd7 _pthread_start
    @     0x7fff90ec73ed thread_start
{code}

I could trace this to the recent refactoring of the test helpers (MESOS-4633, 
MESOS-4634),
{code}
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
75ca1e6c9fde655c41fdf835aa20c47570d21f10
56e9406763e8514a7557ab3862d2f352a61425d5
b377557c2bfc35c894e87becb47122955540f133
7bf6e4f70131175edd4d6d77ea0dc7692b3e72ae
c7df1d7bcb1604c95800871cc0473c946e5b5d16
951539317525f3afe9490ed098617e5d4563a80a
We cannot bisect more!
{code}

It appears the lifetimes of some objects are still not ordered correctly.



> MasterTest.MasterLost is flaky
> ------------------------------
>
>                 Key: MESOS-5000
>                 URL: https://issues.apache.org/jira/browse/MESOS-5000
>             Project: Mesos
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.29.0
>            Reporter: Benjamin Bannier
>              Labels: flaky-test, mesosphere
>
> The test {{MasterTest.MasterLost}} and 
> {{ExceptionTest.DisallowSchedulerActionsOnAbort}} fail at least half the time 
> under OS X (clang, not optimized, {{30efac7}}), e.g.,
> {code}
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from MasterTest
> [ RUN      ] MasterTest.MasterLost
> *** Aborted at 1458650698 (unix time) try "date -d @1458650698" if you are 
> using GNU date ***
> PC: @        0x109685fcc mesos::internal::state::State::store()
> *** SIGSEGV (@0x0) received by PID 18620 (TID 0x111259000) stack trace: ***
>     @     0x7fff850e1f1a _sigtramp
>     @        0x108c74eaf boost::uuids::detail::sha1::process_byte_impl()
>     @        0x1095fd723 mesos::internal::state::protobuf::State::store<>()
>     @        0x1095fbd3e mesos::internal::master::RegistrarProcess::update()
>     @        0x1095fcf6c mesos::internal::master::RegistrarProcess::_apply()
>     @        0x1096797a0 
> _ZZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS_5OwnedINS3_9OperationEEES7_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSC_FSA_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESL_
>     @        0x1096795f0 
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS3_5OwnedINS7_9OperationEEESB_EENS3_6FutureIT_EERKNS3_3PIDIT0_EEMSG_FSE_T1_ET2_EUlPNS3_11ProcessBaseEE_SP_EEEvDpOT_
>     @        0x1096792d9 
> _ZNSt3__110__function6__funcIZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS2_5OwnedINS6_9OperationEEESA_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSF_FSD_T1_ET2_EUlPNS2_11ProcessBaseEE_NS_9allocatorISP_EEFvSO_EEclEOSO_
>     @        0x10b2e9e4c std::__1::function<>::operator()()
>     @        0x10b2e9d9c process::ProcessBase::visit()
>     @        0x10b31d26e process::DispatchEvent::visit()
>     @        0x108ad7d81 process::ProcessBase::serve()
>     @        0x10b2e3cb4 process::ProcessManager::resume()
>     @        0x10b36c479 
> process::ProcessManager::init_threads()::$_1::operator()()
>     @        0x10b36c0a2 
> _ZNSt3__114__thread_proxyINS_5tupleIJNS_6__bindIZN7process14ProcessManager12init_threadsEvE3$_1JNS_17reference_wrapperIKNS_6atomicIbEEEEEEEEEEEEPvSD_
>     @     0x7fff90eca05a _pthread_body
>     @     0x7fff90ec9fd7 _pthread_start
>     @     0x7fff90ec73ed thread_start
> {code}
> Sometimes also {{FaultToleranceTest.SchedulerFailover}} fails with the same 
> stack trace.
> I could trace this to the recent refactoring of the test helpers (MESOS-4633, 
> MESOS-4634),
> {code}
> There are only 'skip'ped commits left to test.
> The first bad commit could be any of:
> 75ca1e6c9fde655c41fdf835aa20c47570d21f10
> 56e9406763e8514a7557ab3862d2f352a61425d5
> b377557c2bfc35c894e87becb47122955540f133
> 7bf6e4f70131175edd4d6d77ea0dc7692b3e72ae
> c7df1d7bcb1604c95800871cc0473c946e5b5d16
> 951539317525f3afe9490ed098617e5d4563a80a
> We cannot bisect more!
> {code}
> It appears the lifetimes of some objects are still not ordered correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5000) MasterTest.MasterLost is flaky

Reply via email to