[ 
https://issues.apache.org/jira/browse/MESOS-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16452536#comment-16452536
 ] 

Greg Mann commented on MESOS-8687:
----------------------------------

Indeed, it seems likely that {{_consume()}} is dispatched by one instance of 
the master actor, but is actually executed on a second instance of the master 
actor after master failover. As suggested by [~bennoe], perhaps we could add a 
{{Clock::settle()}} immediately after the master is reset in the test.

> Check failure in `ProcessBase::_consume()`.
> -------------------------------------------
>
>                 Key: MESOS-8687
>                 URL: https://issues.apache.org/jira/browse/MESOS-8687
>             Project: Mesos
>          Issue Type: Bug
>          Components: libprocess
>    Affects Versions: 1.6.0
>         Environment: ec2 CentOS 7 with SSL
>            Reporter: Alexander Rukletsov
>            Assignee: Benjamin Mahler
>            Priority: Major
>              Labels: flaky-test, reliability
>         Attachments: MasterAPITest.MasterFailover-with-CHECK.txt, 
> MasterFailover-badrun.txt
>
>
> Observed a segfault in the {{MasterAPITest.MasterFailover}} test:
> {noformat}
> 10:59:04 I0319 10:59:04.312197  3274 master.cpp:649] Authorization enabled
> 10:59:04 F0319 10:59:04.312772  3274 owned.hpp:110] Check failed: 'get()' 
> Must be non NULL
> 10:59:04 *** Check failure stack trace: ***
> 10:59:04 I0319 10:59:04.313470  3279 hierarchical.cpp:175] Initialized 
> hierarchical allocator process
> 10:59:04 I0319 10:59:04.313500  3279 whitelist_watcher.cpp:77] No whitelist 
> given
> 10:59:04     @     0x7fe82d44e0cd  google::LogMessage::Fail()
> 10:59:04     @     0x7fe82d44ff1d  google::LogMessage::SendToLog()
> 10:59:04     @     0x7fe82d44dcb3  google::LogMessage::Flush()
> 10:59:04     @     0x7fe82d450919  google::LogMessageFatal::~LogMessageFatal()
> 10:59:04     @     0x7fe82d3cee16  google::CheckNotNull<>()
> 10:59:04     @     0x7fe82d3b4253  process::ProcessBase::_consume()
> 10:59:04     @     0x7fe82d3b4a66  
> _ZNO6lambda12CallableOnceIFN7process6FutureINS1_4http8ResponseEEEvEE10CallableFnINS_8internal7PartialIZNS1_11ProcessBase7consumeEONS1_9HttpEventEEUlRKNS1_5OwnedINS3_7RequestEEEE_JSG_EEEEclEv
> 10:59:04     @     0x7fe82c39c3ca  
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureINS1_4http8ResponseEEEEclINS0_IFSE_vEEEEESE_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISD_EESt14default_deleteISQ_EEOSI_S3_E_JST_SI_St12_PlaceholderILi1EEEEEEclEOS3_
> 10:59:04     @     0x7fe82d39f2c1  process::ProcessBase::consume()
> 10:59:04     @     0x7fe82d3b84da  process::ProcessManager::resume()
> 10:59:04     @     0x7fe82d3bbf56  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> 10:59:04     @     0x7fe82d577870  execute_native_thread_routine
> 10:59:04     @     0x7fe82a761e25  start_thread
> 10:59:04     @     0x7fe82986334d  __clone
> {noformat}
> Full test log is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to