[ https://issues.apache.org/jira/browse/KUDU-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152851#comment-17152851 ]
Grant Henke commented on KUDU-3030: ----------------------------------- Adding a bit more detail, we have seen the following output a few times now when the server crashes with this issue: {code} *** Aborted at 1592719530 (unix time) try "date -d @1592719530" if you are using GNU date *** PC: @ 0xa7e7f5 (unknown) *** SIGSEGV (@0x7f327a18ff08) received by PID 93219 (TID 0x7f327a184700) from PID 2048458504; stack trace: *** @ 0x7f3289af35d0 (unknown) @ 0xa7e7f5 (unknown) @ 0xa7eecb GetStackTrace() @ 0x931b21 (unknown) @ 0x24ba868 tcmalloc::allocate_full_cpp_throw_oom() @ 0x7f3288667a19 std::string::_Rep::_S_create() @ 0x7f32886692a1 std::string::_S_construct<>() @ 0x7f32886696d8 std::string::string() @ 0x22094ac (unknown) @ 0x22710d1 kudu::pb_util::WritePBContainerToPath() @ 0xbd111d kudu::tablet::TabletMetadata::ReplaceSuperBlockUnlocked() @ 0xbd6c72 kudu::tablet::TabletMetadata::Flush() @ 0xbd864d kudu::tablet::TabletMetadata::UpdateAndFlush() @ 0xb5808b kudu::tablet::Tablet::FlushMetadata() @ 0xb5e6fb kudu::tablet::Tablet::DoMergeCompactionOrFlush() @ 0xb60ae2 kudu::tablet::Tablet::Compact() @ 0xb7a763 kudu::tablet::CompactRowSetsOp::Perform() @ 0x223396e kudu::MaintenanceManager::LaunchOp() @ 0x228e67e kudu::ThreadPool::DispatchThread() @ 0x228778f kudu::Thread::SuperviseThread() @ 0x7f3289aebdd5 start_thread @ 0x7f3287dc302d __clone {code} Processing the resulting minidump resulted in the following stack: {code} (gdb) bt #0 GetStackTrace_x86 (result=0x7fefb343c220, max_depth=31, skip_count=0) at /usr/src/debug/kudu-1.7.0-cdh5.16.2/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h:328 #1 0x0000000000a7eecb in GetStackTrace (result=result@entry=0x7fefb343c220, max_depth=max_depth@entry=31, skip_count=skip_count@entry=1) at /usr/src/debug/kudu-1.7.0-cdh5.16.2/thirdparty/src/gperftools-2.6.90/src/stacktrace.cc:295 #2 0x0000000000931b21 in DoSampledAllocation (size=size@entry=108) at /usr/src/debug/kudu-1.7.0-cdh5.16.2/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1169 #3 0x00000000024ba868 in do_malloc (size=108) at /usr/src/debug/kudu-1.7.0-cdh5.16.2/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1361 #4 do_allocate_full<tcmalloc::cpp_throw_oom> (size=108) at /usr/src/debug/kudu-1.7.0-cdh5.16.2/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1751 #5 tcmalloc::allocate_full_cpp_throw_oom (size=108) at /usr/src/debug/kudu-1.7.0-cdh5.16.2/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1765 #6 0x00007fefbb915a19 in ?? () #7 0x0000000006109e00 in ?? () #8 0x00007fefbb9172a1 in ?? () #9 0x0000000006109e00 in ?? () #10 0x00007fefb343c480 in ?? () #11 0x00007fefb343c4e0 in ?? () #12 0x00007fefbb9176d8 in ?? () #13 0x0000000000000000 in ?? () {code} In the short term we should probably revert [3175ed0|https://github.com/apache/kudu/commit/3175ed07df9c9280adec08fea18d15acbd45a4dc] until we have time to address/fix this issue more completely. The downside of periodic crashes drastically outweighs the benefit of heap sampling. > Crash in tcmalloc stack unwinder > -------------------------------- > > Key: KUDU-3030 > URL: https://issues.apache.org/jira/browse/KUDU-3030 > Project: Kudu > Issue Type: Bug > Components: build > Affects Versions: 1.11.0 > Reporter: Todd Lipcon > Priority: Critical > > We recently saw a crash where the tcmalloc heap profiler was trying to unwind > the stack, and ended up accessing invalid memory. The issue here is that > tcmalloc is relying on frame pointers for heap unwinding, but this particular > stack trace was going through libstdc++, which was installed on the system > and doesn't have frame pointers. "usually" this works OK, but when we get > unlucky, we can crash. -- This message was sent by Atlassian Jira (v8.3.4#803005)