[ 
https://issues.apache.org/jira/browse/KUDU-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152851#comment-17152851
 ] 

Grant Henke commented on KUDU-3030:
-----------------------------------

Adding a bit more detail, we have seen the following output a few times now 
when the server crashes with this issue: 
{code}
*** Aborted at 1592719530 (unix time) try "date -d @1592719530" if you are 
using GNU date ***
PC: @           0xa7e7f5 (unknown)
*** SIGSEGV (@0x7f327a18ff08) received by PID 93219 (TID 0x7f327a184700) from 
PID 2048458504; stack trace: ***
    @     0x7f3289af35d0 (unknown)
    @           0xa7e7f5 (unknown)
    @           0xa7eecb GetStackTrace()
    @           0x931b21 (unknown)
    @          0x24ba868 tcmalloc::allocate_full_cpp_throw_oom()
    @     0x7f3288667a19 std::string::_Rep::_S_create()
    @     0x7f32886692a1 std::string::_S_construct<>()
    @     0x7f32886696d8 std::string::string()
    @          0x22094ac (unknown)
    @          0x22710d1 kudu::pb_util::WritePBContainerToPath()
    @           0xbd111d 
kudu::tablet::TabletMetadata::ReplaceSuperBlockUnlocked()
    @           0xbd6c72 kudu::tablet::TabletMetadata::Flush()
    @           0xbd864d kudu::tablet::TabletMetadata::UpdateAndFlush()
    @           0xb5808b kudu::tablet::Tablet::FlushMetadata()
    @           0xb5e6fb kudu::tablet::Tablet::DoMergeCompactionOrFlush()
    @           0xb60ae2 kudu::tablet::Tablet::Compact()
    @           0xb7a763 kudu::tablet::CompactRowSetsOp::Perform()
    @          0x223396e kudu::MaintenanceManager::LaunchOp()
    @          0x228e67e kudu::ThreadPool::DispatchThread()
    @          0x228778f kudu::Thread::SuperviseThread()
    @     0x7f3289aebdd5 start_thread
    @     0x7f3287dc302d __clone
{code}

Processing the resulting minidump resulted in the following stack:
{code}
(gdb) bt
#0  GetStackTrace_x86 (result=0x7fefb343c220, max_depth=31, skip_count=0)
    at 
/usr/src/debug/kudu-1.7.0-cdh5.16.2/thirdparty/src/gperftools-2.6.90/src/stacktrace_x86-inl.h:328
#1  0x0000000000a7eecb in GetStackTrace (result=result@entry=0x7fefb343c220, 
    max_depth=max_depth@entry=31, skip_count=skip_count@entry=1)
    at 
/usr/src/debug/kudu-1.7.0-cdh5.16.2/thirdparty/src/gperftools-2.6.90/src/stacktrace.cc:295
#2  0x0000000000931b21 in DoSampledAllocation (size=size@entry=108)
    at 
/usr/src/debug/kudu-1.7.0-cdh5.16.2/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1169
#3  0x00000000024ba868 in do_malloc (size=108)
    at 
/usr/src/debug/kudu-1.7.0-cdh5.16.2/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1361
#4  do_allocate_full<tcmalloc::cpp_throw_oom> (size=108)
    at 
/usr/src/debug/kudu-1.7.0-cdh5.16.2/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1751
#5  tcmalloc::allocate_full_cpp_throw_oom (size=108)
    at 
/usr/src/debug/kudu-1.7.0-cdh5.16.2/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1765
#6  0x00007fefbb915a19 in ?? ()
#7  0x0000000006109e00 in ?? ()
#8  0x00007fefbb9172a1 in ?? ()
#9  0x0000000006109e00 in ?? ()
#10 0x00007fefb343c480 in ?? ()
#11 0x00007fefb343c4e0 in ?? ()
#12 0x00007fefbb9176d8 in ?? ()
#13 0x0000000000000000 in ?? ()
{code}

In the short term we should probably revert 
[3175ed0|https://github.com/apache/kudu/commit/3175ed07df9c9280adec08fea18d15acbd45a4dc]
 until we have time to address/fix this issue more completely. The downside of 
periodic crashes drastically outweighs the benefit of heap sampling.

> Crash in tcmalloc stack unwinder
> --------------------------------
>
>                 Key: KUDU-3030
>                 URL: https://issues.apache.org/jira/browse/KUDU-3030
>             Project: Kudu
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 1.11.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> We recently saw a crash where the tcmalloc heap profiler was trying to unwind 
> the stack, and ended up accessing invalid memory. The issue here is that 
> tcmalloc is relying on frame pointers for heap unwinding, but this particular 
> stack trace was going through libstdc++, which was installed on the system 
> and doesn't have frame pointers. "usually" this works OK, but when we get 
> unlucky, we can crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to