[ https://issues.apache.org/jira/browse/MESOS-5000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benjamin Bannier updated MESOS-5000: ------------------------------------ Description: The test {{MasterTest.MasterLost}} and {{ExceptionTest.DisallowSchedulerActionsOnAbort}} fail at least half the time under OS X (clang, not optimized, {{30efac7}}), e.g., {code} [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from MasterTest [ RUN ] MasterTest.MasterLost *** Aborted at 1458650698 (unix time) try "date -d @1458650698" if you are using GNU date *** PC: @ 0x109685fcc mesos::internal::state::State::store() *** SIGSEGV (@0x0) received by PID 18620 (TID 0x111259000) stack trace: *** @ 0x7fff850e1f1a _sigtramp @ 0x108c74eaf boost::uuids::detail::sha1::process_byte_impl() @ 0x1095fd723 mesos::internal::state::protobuf::State::store<>() @ 0x1095fbd3e mesos::internal::master::RegistrarProcess::update() @ 0x1095fcf6c mesos::internal::master::RegistrarProcess::_apply() @ 0x1096797a0 _ZZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS_5OwnedINS3_9OperationEEES7_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSC_FSA_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESL_ @ 0x1096795f0 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS3_5OwnedINS7_9OperationEEESB_EENS3_6FutureIT_EERKNS3_3PIDIT0_EEMSG_FSE_T1_ET2_EUlPNS3_11ProcessBaseEE_SP_EEEvDpOT_ @ 0x1096792d9 _ZNSt3__110__function6__funcIZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS2_5OwnedINS6_9OperationEEESA_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSF_FSD_T1_ET2_EUlPNS2_11ProcessBaseEE_NS_9allocatorISP_EEFvSO_EEclEOSO_ @ 0x10b2e9e4c std::__1::function<>::operator()() @ 0x10b2e9d9c process::ProcessBase::visit() @ 0x10b31d26e process::DispatchEvent::visit() @ 0x108ad7d81 process::ProcessBase::serve() @ 0x10b2e3cb4 process::ProcessManager::resume() @ 0x10b36c479 process::ProcessManager::init_threads()::$_1::operator()() @ 0x10b36c0a2 _ZNSt3__114__thread_proxyINS_5tupleIJNS_6__bindIZN7process14ProcessManager12init_threadsEvE3$_1JNS_17reference_wrapperIKNS_6atomicIbEEEEEEEEEEEEPvSD_ @ 0x7fff90eca05a _pthread_body @ 0x7fff90ec9fd7 _pthread_start @ 0x7fff90ec73ed thread_start {code} Sometimes also {{FaultToleranceTest.SchedulerFailover}} fails with the same stack trace. I could trace this to the recent refactoring of the test helpers (MESOS-4633, MESOS-4634), {code} There are only 'skip'ped commits left to test. The first bad commit could be any of: 75ca1e6c9fde655c41fdf835aa20c47570d21f10 56e9406763e8514a7557ab3862d2f352a61425d5 b377557c2bfc35c894e87becb47122955540f133 7bf6e4f70131175edd4d6d77ea0dc7692b3e72ae c7df1d7bcb1604c95800871cc0473c946e5b5d16 951539317525f3afe9490ed098617e5d4563a80a We cannot bisect more! {code} It appears the lifetimes of some objects are still not ordered correctly. was: The test {{MasterTest.MasterLost}} and {{ExceptionTest.DisallowSchedulerActionsOnAbort}} fail at least half the time under OS X (clang, not optimized, {{30efac7}}), e.g., {code} [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from MasterTest [ RUN ] MasterTest.MasterLost *** Aborted at 1458650698 (unix time) try "date -d @1458650698" if you are using GNU date *** PC: @ 0x109685fcc mesos::internal::state::State::store() *** SIGSEGV (@0x0) received by PID 18620 (TID 0x111259000) stack trace: *** @ 0x7fff850e1f1a _sigtramp @ 0x108c74eaf boost::uuids::detail::sha1::process_byte_impl() @ 0x1095fd723 mesos::internal::state::protobuf::State::store<>() @ 0x1095fbd3e mesos::internal::master::RegistrarProcess::update() @ 0x1095fcf6c mesos::internal::master::RegistrarProcess::_apply() @ 0x1096797a0 _ZZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS_5OwnedINS3_9OperationEEES7_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSC_FSA_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESL_ @ 0x1096795f0 _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS3_5OwnedINS7_9OperationEEESB_EENS3_6FutureIT_EERKNS3_3PIDIT0_EEMSG_FSE_T1_ET2_EUlPNS3_11ProcessBaseEE_SP_EEEvDpOT_ @ 0x1096792d9 _ZNSt3__110__function6__funcIZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS2_5OwnedINS6_9OperationEEESA_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSF_FSD_T1_ET2_EUlPNS2_11ProcessBaseEE_NS_9allocatorISP_EEFvSO_EEclEOSO_ @ 0x10b2e9e4c std::__1::function<>::operator()() @ 0x10b2e9d9c process::ProcessBase::visit() @ 0x10b31d26e process::DispatchEvent::visit() @ 0x108ad7d81 process::ProcessBase::serve() @ 0x10b2e3cb4 process::ProcessManager::resume() @ 0x10b36c479 process::ProcessManager::init_threads()::$_1::operator()() @ 0x10b36c0a2 _ZNSt3__114__thread_proxyINS_5tupleIJNS_6__bindIZN7process14ProcessManager12init_threadsEvE3$_1JNS_17reference_wrapperIKNS_6atomicIbEEEEEEEEEEEEPvSD_ @ 0x7fff90eca05a _pthread_body @ 0x7fff90ec9fd7 _pthread_start @ 0x7fff90ec73ed thread_start {code} I could trace this to the recent refactoring of the test helpers (MESOS-4633, MESOS-4634), {code} There are only 'skip'ped commits left to test. The first bad commit could be any of: 75ca1e6c9fde655c41fdf835aa20c47570d21f10 56e9406763e8514a7557ab3862d2f352a61425d5 b377557c2bfc35c894e87becb47122955540f133 7bf6e4f70131175edd4d6d77ea0dc7692b3e72ae c7df1d7bcb1604c95800871cc0473c946e5b5d16 951539317525f3afe9490ed098617e5d4563a80a We cannot bisect more! {code} It appears the lifetimes of some objects are still not ordered correctly. > MasterTest.MasterLost is flaky > ------------------------------ > > Key: MESOS-5000 > URL: https://issues.apache.org/jira/browse/MESOS-5000 > Project: Mesos > Issue Type: Bug > Components: test > Affects Versions: 0.29.0 > Reporter: Benjamin Bannier > Labels: flaky-test, mesosphere > > The test {{MasterTest.MasterLost}} and > {{ExceptionTest.DisallowSchedulerActionsOnAbort}} fail at least half the time > under OS X (clang, not optimized, {{30efac7}}), e.g., > {code} > [==========] Running 1 test from 1 test case. > [----------] Global test environment set-up. > [----------] 1 test from MasterTest > [ RUN ] MasterTest.MasterLost > *** Aborted at 1458650698 (unix time) try "date -d @1458650698" if you are > using GNU date *** > PC: @ 0x109685fcc mesos::internal::state::State::store() > *** SIGSEGV (@0x0) received by PID 18620 (TID 0x111259000) stack trace: *** > @ 0x7fff850e1f1a _sigtramp > @ 0x108c74eaf boost::uuids::detail::sha1::process_byte_impl() > @ 0x1095fd723 mesos::internal::state::protobuf::State::store<>() > @ 0x1095fbd3e mesos::internal::master::RegistrarProcess::update() > @ 0x1095fcf6c mesos::internal::master::RegistrarProcess::_apply() > @ 0x1096797a0 > _ZZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS_5OwnedINS3_9OperationEEES7_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSC_FSA_T1_ET2_ENKUlPNS_11ProcessBaseEE_clESL_ > @ 0x1096795f0 > _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS3_5OwnedINS7_9OperationEEESB_EENS3_6FutureIT_EERKNS3_3PIDIT0_EEMSG_FSE_T1_ET2_EUlPNS3_11ProcessBaseEE_SP_EEEvDpOT_ > @ 0x1096792d9 > _ZNSt3__110__function6__funcIZN7process8dispatchIbN5mesos8internal6master16RegistrarProcessENS2_5OwnedINS6_9OperationEEESA_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSF_FSD_T1_ET2_EUlPNS2_11ProcessBaseEE_NS_9allocatorISP_EEFvSO_EEclEOSO_ > @ 0x10b2e9e4c std::__1::function<>::operator()() > @ 0x10b2e9d9c process::ProcessBase::visit() > @ 0x10b31d26e process::DispatchEvent::visit() > @ 0x108ad7d81 process::ProcessBase::serve() > @ 0x10b2e3cb4 process::ProcessManager::resume() > @ 0x10b36c479 > process::ProcessManager::init_threads()::$_1::operator()() > @ 0x10b36c0a2 > _ZNSt3__114__thread_proxyINS_5tupleIJNS_6__bindIZN7process14ProcessManager12init_threadsEvE3$_1JNS_17reference_wrapperIKNS_6atomicIbEEEEEEEEEEEEPvSD_ > @ 0x7fff90eca05a _pthread_body > @ 0x7fff90ec9fd7 _pthread_start > @ 0x7fff90ec73ed thread_start > {code} > Sometimes also {{FaultToleranceTest.SchedulerFailover}} fails with the same > stack trace. > I could trace this to the recent refactoring of the test helpers (MESOS-4633, > MESOS-4634), > {code} > There are only 'skip'ped commits left to test. > The first bad commit could be any of: > 75ca1e6c9fde655c41fdf835aa20c47570d21f10 > 56e9406763e8514a7557ab3862d2f352a61425d5 > b377557c2bfc35c894e87becb47122955540f133 > 7bf6e4f70131175edd4d6d77ea0dc7692b3e72ae > c7df1d7bcb1604c95800871cc0473c946e5b5d16 > 951539317525f3afe9490ed098617e5d4563a80a > We cannot bisect more! > {code} > It appears the lifetimes of some objects are still not ordered correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)