[ 
https://issues.apache.org/jira/browse/MESOS-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412454#comment-16412454
 ] 

Benjamin Mahler commented on MESOS-8687:
----------------------------------------

I'm stumped on how this happened, added a CHECK to help diagnose any future 
instances of the crash:

{noformat}
commit 2132b66c37bfd5ba7ae1cde74aa8ddb8d6c23bce (HEAD -> master)
Author: Benjamin Mahler <bmah...@apache.org>
Date:   Fri Mar 23 23:27:26 2018 -0700

    Added a temporary CHECK to diagnose MESOS-8687.

    From the stack trace in MESOS-8687, it appears the `httpSequence`
    within `ProcessBase` was null during `ProcessBase::_consume`. In
    order to diagnose how this can occur, this adds a CHECK that will
    print the pid and endpoint name.
{noformat}

> Check failure in `ProcessBase::_consume()`.
> -------------------------------------------
>
>                 Key: MESOS-8687
>                 URL: https://issues.apache.org/jira/browse/MESOS-8687
>             Project: Mesos
>          Issue Type: Bug
>          Components: libprocess
>    Affects Versions: 1.6.0
>         Environment: ec2 CentOS 7 with SSL
>            Reporter: Alexander Rukletsov
>            Priority: Major
>              Labels: flaky-test, reliability
>         Attachments: MasterFailover-badrun.txt
>
>
> Observed a segfault in the {{MasterAPITest.MasterFailover}} test:
> {noformat}
> 10:59:04 I0319 10:59:04.312197  3274 master.cpp:649] Authorization enabled
> 10:59:04 F0319 10:59:04.312772  3274 owned.hpp:110] Check failed: 'get()' 
> Must be non NULL
> 10:59:04 *** Check failure stack trace: ***
> 10:59:04 I0319 10:59:04.313470  3279 hierarchical.cpp:175] Initialized 
> hierarchical allocator process
> 10:59:04 I0319 10:59:04.313500  3279 whitelist_watcher.cpp:77] No whitelist 
> given
> 10:59:04     @     0x7fe82d44e0cd  google::LogMessage::Fail()
> 10:59:04     @     0x7fe82d44ff1d  google::LogMessage::SendToLog()
> 10:59:04     @     0x7fe82d44dcb3  google::LogMessage::Flush()
> 10:59:04     @     0x7fe82d450919  google::LogMessageFatal::~LogMessageFatal()
> 10:59:04     @     0x7fe82d3cee16  google::CheckNotNull<>()
> 10:59:04     @     0x7fe82d3b4253  process::ProcessBase::_consume()
> 10:59:04     @     0x7fe82d3b4a66  
> _ZNO6lambda12CallableOnceIFN7process6FutureINS1_4http8ResponseEEEvEE10CallableFnINS_8internal7PartialIZNS1_11ProcessBase7consumeEONS1_9HttpEventEEUlRKNS1_5OwnedINS3_7RequestEEEE_JSG_EEEEclEv
> 10:59:04     @     0x7fe82c39c3ca  
> _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureINS1_4http8ResponseEEEEclINS0_IFSE_vEEEEESE_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISD_EESt14default_deleteISQ_EEOSI_S3_E_JST_SI_St12_PlaceholderILi1EEEEEEclEOS3_
> 10:59:04     @     0x7fe82d39f2c1  process::ProcessBase::consume()
> 10:59:04     @     0x7fe82d3b84da  process::ProcessManager::resume()
> 10:59:04     @     0x7fe82d3bbf56  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> 10:59:04     @     0x7fe82d577870  execute_native_thread_routine
> 10:59:04     @     0x7fe82a761e25  start_thread
> 10:59:04     @     0x7fe82986334d  __clone
> {noformat}
> Full test log is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to