[ https://issues.apache.org/jira/browse/MESOS-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452536#comment-16452536 ]
Greg Mann commented on MESOS-8687: ---------------------------------- Indeed, it seems likely that {{_consume()}} is dispatched by one instance of the master actor, but is actually executed on a second instance of the master actor after master failover. As suggested by [~bennoe], perhaps we could add a {{Clock::settle()}} immediately after the master is reset in the test. > Check failure in `ProcessBase::_consume()`. > ------------------------------------------- > > Key: MESOS-8687 > URL: https://issues.apache.org/jira/browse/MESOS-8687 > Project: Mesos > Issue Type: Bug > Components: libprocess > Affects Versions: 1.6.0 > Environment: ec2 CentOS 7 with SSL > Reporter: Alexander Rukletsov > Assignee: Benjamin Mahler > Priority: Major > Labels: flaky-test, reliability > Attachments: MasterAPITest.MasterFailover-with-CHECK.txt, > MasterFailover-badrun.txt > > > Observed a segfault in the {{MasterAPITest.MasterFailover}} test: > {noformat} > 10:59:04 I0319 10:59:04.312197 3274 master.cpp:649] Authorization enabled > 10:59:04 F0319 10:59:04.312772 3274 owned.hpp:110] Check failed: 'get()' > Must be non NULL > 10:59:04 *** Check failure stack trace: *** > 10:59:04 I0319 10:59:04.313470 3279 hierarchical.cpp:175] Initialized > hierarchical allocator process > 10:59:04 I0319 10:59:04.313500 3279 whitelist_watcher.cpp:77] No whitelist > given > 10:59:04 @ 0x7fe82d44e0cd google::LogMessage::Fail() > 10:59:04 @ 0x7fe82d44ff1d google::LogMessage::SendToLog() > 10:59:04 @ 0x7fe82d44dcb3 google::LogMessage::Flush() > 10:59:04 @ 0x7fe82d450919 google::LogMessageFatal::~LogMessageFatal() > 10:59:04 @ 0x7fe82d3cee16 google::CheckNotNull<>() > 10:59:04 @ 0x7fe82d3b4253 process::ProcessBase::_consume() > 10:59:04 @ 0x7fe82d3b4a66 > _ZNO6lambda12CallableOnceIFN7process6FutureINS1_4http8ResponseEEEvEE10CallableFnINS_8internal7PartialIZNS1_11ProcessBase7consumeEONS1_9HttpEventEEUlRKNS1_5OwnedINS3_7RequestEEEE_JSG_EEEEclEv > 10:59:04 @ 0x7fe82c39c3ca > _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureINS1_4http8ResponseEEEEclINS0_IFSE_vEEEEESE_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISD_EESt14default_deleteISQ_EEOSI_S3_E_JST_SI_St12_PlaceholderILi1EEEEEEclEOS3_ > 10:59:04 @ 0x7fe82d39f2c1 process::ProcessBase::consume() > 10:59:04 @ 0x7fe82d3b84da process::ProcessManager::resume() > 10:59:04 @ 0x7fe82d3bbf56 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv > 10:59:04 @ 0x7fe82d577870 execute_native_thread_routine > 10:59:04 @ 0x7fe82a761e25 start_thread > 10:59:04 @ 0x7fe82986334d __clone > {noformat} > Full test log is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)