[ https://issues.apache.org/jira/browse/MESOS-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412454#comment-16412454 ]
Benjamin Mahler commented on MESOS-8687: ---------------------------------------- I'm stumped on how this happened, added a CHECK to help diagnose any future instances of the crash: {noformat} commit 2132b66c37bfd5ba7ae1cde74aa8ddb8d6c23bce (HEAD -> master) Author: Benjamin Mahler <bmah...@apache.org> Date: Fri Mar 23 23:27:26 2018 -0700 Added a temporary CHECK to diagnose MESOS-8687. From the stack trace in MESOS-8687, it appears the `httpSequence` within `ProcessBase` was null during `ProcessBase::_consume`. In order to diagnose how this can occur, this adds a CHECK that will print the pid and endpoint name. {noformat} > Check failure in `ProcessBase::_consume()`. > ------------------------------------------- > > Key: MESOS-8687 > URL: https://issues.apache.org/jira/browse/MESOS-8687 > Project: Mesos > Issue Type: Bug > Components: libprocess > Affects Versions: 1.6.0 > Environment: ec2 CentOS 7 with SSL > Reporter: Alexander Rukletsov > Priority: Major > Labels: flaky-test, reliability > Attachments: MasterFailover-badrun.txt > > > Observed a segfault in the {{MasterAPITest.MasterFailover}} test: > {noformat} > 10:59:04 I0319 10:59:04.312197 3274 master.cpp:649] Authorization enabled > 10:59:04 F0319 10:59:04.312772 3274 owned.hpp:110] Check failed: 'get()' > Must be non NULL > 10:59:04 *** Check failure stack trace: *** > 10:59:04 I0319 10:59:04.313470 3279 hierarchical.cpp:175] Initialized > hierarchical allocator process > 10:59:04 I0319 10:59:04.313500 3279 whitelist_watcher.cpp:77] No whitelist > given > 10:59:04 @ 0x7fe82d44e0cd google::LogMessage::Fail() > 10:59:04 @ 0x7fe82d44ff1d google::LogMessage::SendToLog() > 10:59:04 @ 0x7fe82d44dcb3 google::LogMessage::Flush() > 10:59:04 @ 0x7fe82d450919 google::LogMessageFatal::~LogMessageFatal() > 10:59:04 @ 0x7fe82d3cee16 google::CheckNotNull<>() > 10:59:04 @ 0x7fe82d3b4253 process::ProcessBase::_consume() > 10:59:04 @ 0x7fe82d3b4a66 > _ZNO6lambda12CallableOnceIFN7process6FutureINS1_4http8ResponseEEEvEE10CallableFnINS_8internal7PartialIZNS1_11ProcessBase7consumeEONS1_9HttpEventEEUlRKNS1_5OwnedINS3_7RequestEEEE_JSG_EEEEclEv > 10:59:04 @ 0x7fe82c39c3ca > _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureINS1_4http8ResponseEEEEclINS0_IFSE_vEEEEESE_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISD_EESt14default_deleteISQ_EEOSI_S3_E_JST_SI_St12_PlaceholderILi1EEEEEEclEOS3_ > 10:59:04 @ 0x7fe82d39f2c1 process::ProcessBase::consume() > 10:59:04 @ 0x7fe82d3b84da process::ProcessManager::resume() > 10:59:04 @ 0x7fe82d3bbf56 > _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv > 10:59:04 @ 0x7fe82d577870 execute_native_thread_routine > 10:59:04 @ 0x7fe82a761e25 start_thread > 10:59:04 @ 0x7fe82986334d __clone > {noformat} > Full test log is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)