[ https://issues.apache.org/jira/browse/MESOS-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560115#comment-14560115 ]
Ian Downes commented on MESOS-2771: ----------------------------------- It's expected that containerizer->usage() fails when the call is for an unknown container. Previously this was just ignored and no subsequent calls were attempted for that container. Could this be a dangling reference to executorInfo captured for the onFailed() lambda? {code} ExecutorInfo executorInfo = monitored[containerId]; return containerizer->usage(containerId) .then(defer( self(), &ResourceMonitorProcess::_usage, containerId, executorInfo, lambda::_1)) .onFailed([&containerId, &executorInfo](const string& failure) { LOG(WARNING) << "Failed to get resource usage for " << " container " << containerId << " for executor " << executorInfo.executor_id() << " of framework " << executorInfo.framework_id() << ": " << failure; }) {code} > SIGSEGV received during MesosContainerizerProcess::usage() > ---------------------------------------------------------- > > Key: MESOS-2771 > URL: https://issues.apache.org/jira/browse/MESOS-2771 > Project: Mesos > Issue Type: Bug > Affects Versions: 0.23.0 > Reporter: Yan Xu > > Observed in production. > {noformat:title=slave log} > I0523 17:03:59.830229 56587 port_mapping.cpp:2616] Freed ephemeral ports > [33792,34816) for container with pid 47791 > I0523 17:03:59.849773 56587 port_mapping.cpp:2764] Successfully performed > cleanup for pid 47791 > *** Aborted at 1432400641 (unix time) try "date -d @1432400641" if you are > using GNU date *** > PC: @ 0x7f100fcbfd85 > _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureIN5mesos8internal5slave15ResourceMonitor5UsageEE8onFailedIZNS7_22ResourceMonitorProcess5usageENS5_11ContainerIDEEUlS1_E_vEERKSA_OT_NSA_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_ > I0523 17:03:59.898959 56587 slave.cpp:3246] Executor > 'thermos-1432400210944-mesos-test-exhaust_diskspace-5-4744d0fb-e0a1-4e40-bb22-56bd5cbd9524' > of framework 201103282247-0000000019-0000 terminated with signal Killed > I0523 17:04:03.419869 56587 slave.cpp:2547] Handling status update > TASK_FAILED (UUID: 3be19404-f737-4a70-a330-d1d924a85dbb) for task > 1432400210944-mesos-test-exhaust_diskspace-5-4744d0fb-e0a1-4e40-bb22-56bd5cbd9524 > of framework 201103282247-0000000019-0000 from @0.0.0.0:0 > I0523 17:04:03.773061 56587 slave.cpp:4077] Received a new estimation of the > oversubscribable resources > I0523 17:04:03.773907 56587 slave.cpp:4077] Received a new estimation of the > oversubscribable resources > I0523 17:04:03.774683 56587 slave.cpp:4077] Received a new estimation of the > oversubscribable resources > I0523 17:04:03.776345 56587 slave.cpp:4077] Received a new estimation of the > oversubscribable resources > *** SIGSEGV (@0x0) received by PID 56573 (TID 0x7f100a190940) from PID 0; > stack trace: *** > @ 0x7f100f181ca0 (unknown) > @ 0x7f100fcbfd85 > _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureIN5mesos8internal5slave15ResourceMonitor5UsageEE8onFailedIZNS7_22ResourceMonitorProcess5usageENS5_11ContainerIDEEUlS1_E_vEERKSA_OT_NSA_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_ > @ 0x7f100fb01506 process::internal::run<>() > @ 0x7f100fcc701b process::Future<>::fail() > @ 0x7f100fccfbde process::internal::thenf<>() > @ 0x7f100fd64bee > _ZN7process8internal3runISt8functionIFvRKNS_6FutureIN5mesos18ResourceStatisticsEEEEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_ > @ 0x7f100fd656dd process::Future<>::fail() > @ 0x7f100fd6c332 process::Promise<>::associate() > @ 0x7f100fe2777e > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos18ResourceStatisticsENS5_8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDESA_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_ET2_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x7f101015561a process::ProcessManager::resume() > @ 0x7f10101558dc process::schedule() > @ 0x7f100f17983d start_thread > @ 0x7f100e96bfcd clone > /usr/local/bin/mesos-slave.sh: line 102: 56573 Segmentation fault (core > dumped) $debug /usr/local/sbin/mesos-slave "${MESOS_FLAGS[@]}" > Slave Exit Status: 139 > {noformat} > {noformat:title=gdb core dump} > Thread 20 (Thread 0x7f100a190940 (LWP 56574)): > #0 _M_data (__functor=Unhandled dwarf expression opcode 0xf3 > ) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/bits/basic_string.h:293 > #1 _M_rep (__functor=Unhandled dwarf expression opcode 0xf3 > ) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/bits/basic_string.h:301 > #2 size (__functor=Unhandled dwarf expression opcode 0xf3 > ) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/bits/basic_string.h:716 > #3 operator<< <char, std::char_traits<char>, std::allocator<char> > > (__functor=Unhandled dwarf expression opcode 0xf3 > ) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/bits/basic_string.h:2758 > #4 operator<< (__functor=Unhandled dwarf expression opcode 0xf3 > ) at ../include/mesos/type_utils.hpp:267 > #5 operator() (__functor=Unhandled dwarf expression opcode 0xf3 > ) at slave/monitor.cpp:129 > #6 operator() (__functor=Unhandled dwarf expression opcode 0xf3 > ) at ../3rdparty/libprocess/include/process/future.hpp:220 > #7 std::_Function_handler<void(const std::basic_string<char, > std::char_traits<char>, std::allocator<char> >&), > process::Future<T>::onFailed(F&&, process::Future<T>::Prefer) const [with F = > mesos::internal::slave::ResourceMonitorProcess::usage(mesos::ContainerID)::__lambda180; > <template-parameter-2-2> = void; T = > mesos::internal::slave::ResourceMonitor::Usage]::__lambda2>::_M_invoke(const > std::_Any_data &, const std::basic_string<char, std::char_traits<char>, > std::allocator<char> > &) (__functor=Unhandled dwarf expression opcode 0xf3 > ) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/functional:2071 > #8 0x00007f100fb01506 in process::internal::run<std::function<void(const > std::basic_string<char>&)>, std::basic_string<char, std::char_traits<char>, > std::allocator<char> >&>(const std::vector<std::function<void(const > std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>, > std::allocator<std::function<void(const std::basic_string<char, > std::char_traits<char>, std::allocator<char> >&)> > > &) > (callbacks=std::vector of length 1, capacity 1 = {...}) > at ../3rdparty/libprocess/include/process/future.hpp:420 > #9 0x00007f100fcc701b in > process::Future<mesos::internal::slave::ResourceMonitor::Usage>::fail > (this=0x7f0ffc185ca8, _message="Unknown container: > c0ab6cd3-fe4f-49bd-8dd6-32b388fcfab2") > at ../3rdparty/libprocess/include/process/future.hpp:1406 > #10 0x00007f100fccfbde in fail (f=Unhandled dwarf expression opcode 0xf3 > ) at ../3rdparty/libprocess/include/process/future.hpp:649 > #11 process::internal::thenf<mesos::ResourceStatistics, > mesos::internal::slave::ResourceMonitor::Usage>(const > std::function<process::Future<mesos::internal::slave::ResourceMonitor::Usage>(const > mesos::ResourceStatistics&)> &, const > std::shared_ptr<process::Promise<mesos::internal::slave::ResourceMonitor::Usage> > > &, const process::Future<mesos::ResourceStatistics> &) (f=Unhandled dwarf > expression opcode 0xf3 > ) at ../3rdparty/libprocess/include/process/future.hpp:1193 > #12 0x00007f100fd64bee in operator() (callbacks=std::vector of length 1, > capacity 1 = {...}) at > /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/functional:2464 > #13 process::internal::run<std::function<void(const > process::Future<mesos::ResourceStatistics>&)>, > process::Future<mesos::ResourceStatistics>&>(const > std::vector<std::function<void(const > process::Future<mesos::ResourceStatistics>&)>, > std::allocator<std::function<void(const > process::Future<mesos::ResourceStatistics>&)> > > &) (callbacks=std::vector > of length 1, capacity 1 = {...}) at > ../3rdparty/libprocess/include/process/future.hpp:420 > #14 0x00007f100fd656dd in process::Future<mesos::ResourceStatistics>::fail > (this=0x7f0ff8046230, _message="Unknown container: > c0ab6cd3-fe4f-49bd-8dd6-32b388fcfab2") at > ../3rdparty/libprocess/include/process/future.hpp:1407 > #15 0x00007f100fd6c332 in onFailed (this=Unhandled dwarf expression opcode > 0xf3 > ) at ../3rdparty/libprocess/include/process/future.hpp:1121 > #16 onFailed<std::_Bind<std::_Mem_fn<bool > (process::Future<mesos::ResourceStatistics>::*)(const > std::basic_string<char>&)>(process::Future<mesos::ResourceStatistics>, > std::_Placeholder<1>)>, bool> (this=Unhandled dwarf expression opcode 0xf3 > ) > at ../3rdparty/libprocess/include/process/future.hpp:221 > #17 onFailed<std::_Bind<std::_Mem_fn<bool > (process::Future<mesos::ResourceStatistics>::*)(const > std::basic_string<char>&)>(process::Future<mesos::ResourceStatistics>, > std::_Placeholder<1>)> > (this=Unhandled dwarf expression opcode 0xf3 > ) > at ../3rdparty/libprocess/include/process/future.hpp:270 > #18 process::Promise<mesos::ResourceStatistics>::associate (this=Unhandled > dwarf expression opcode 0xf3 > ) at ../3rdparty/libprocess/include/process/future.hpp:635 > #19 0x00007f100fe2777e in operator() (__functor=Unhandled dwarf expression > opcode 0xf3 > ) at ../3rdparty/libprocess/include/process/dispatch.hpp:239 > #20 std::_Function_handler<void(process::ProcessBase*), > process::dispatch(const process::PID<T>&, process::Future<T> (T::*)(P0), A0) > [with R = mesos::ResourceStatistics; T = > mesos::internal::slave::MesosContainerizerProcess; P0 = const > mesos::ContainerID&; A0 = mesos::ContainerID]::__lambda21>::_M_invoke(const > std::_Any_data &, process::ProcessBase *) (__functor=Unhandled dwarf > expression opcode 0xf3 > ) at /opt/rh/devtoolset-2/root/usr/include/c++/4.8.2/functional:2071 > #21 0x00007f101015561a in process::ProcessManager::resume (this=0xc24d20, > process=0x7f0ffc0169b0) at src/process.cpp:2172 > #22 0x00007f10101558dc in process::schedule (arg=Unhandled dwarf expression > opcode 0xf3 > ) at src/process.cpp:602 > #23 0x00007f100f17983d in start_thread () from /lib64/libpthread.so.0 > #24 0x00007f100e96bfcd in clone () from /lib64/libc.so.6 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)