[ 
https://issues.apache.org/jira/browse/KUDU-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104376#comment-17104376
 ] 

RuiChen commented on KUDU-3096:
-------------------------------

I face the same issue in my machine(ubuntu 18.04, aarch64 VM) when I repeat to 
run test case DebugUtilTest.TestThreadBlockingSignals 100 times, I check the 
core dump file, find function *GetThreadStack* try to collect stack info on an 
init uncomplete thread, that cause tcmalloc break in ThreadCache, see core dump 
info given by huangtianhua, you can see, signal handler is called when the 
tcmalloc ThreadCached still init, then start to handle sign to collect thread 
stack info, that will try to access some uninited var.

 

root@ubuntu:/home/jenkins/workspace/kudu/build/debug# gdb bin/debug-util-test 
core.27980
...
(gdb) bt
#0 *tcmalloc::Sampler::RecordAllocation* (k=<optimized out>, this=<optimized 
out>)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/sampler.h:166
#1 tcmalloc::ThreadCache::SampleAllocation (k=<optimized out>, this=<optimized 
out>)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/thread_cache.h:489
#2 (anonymous namespace)::do_malloc (size=136)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1360
........
#7 tc_malloc (size=size@entry=136)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1873
#8 0x0000ffff9764f3dc in load_debug_frame (is_local=<optimized out>, 
bufsize=<synthetic pointer>,
 buf=<synthetic pointer>, file=<optimized out>)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/libunwind-1.3.1/src/dwarf/Gfind_proc_info-lsb.c:127
...
#18 0x0000ffff9764b6c8 in _ULaarch64_step (cursor=0xffff86a96f10)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/libunwind-1.3.1/src/aarch64/Gstep.c:146
#19 0x0000ffff9824a4b4 in kudu::StackTrace::Collect (this=0xffffdbaad4a0, 
skip_frames=0)
 at /home/jenkins/workspace/kudu/src/kudu/util/debug-util.cc:615
#20 0x0000ffff98248c48 in kudu::(anonymous namespace)::HandleStackTraceSignal 
(info=0xffff86a991b0)
 at /home/jenkins/workspace/kudu/src/kudu/util/debug-util.cc:289
#21 <signal handler called>
#22 tcmalloc::ThreadCache::FreeList::Init (size=16, this=0xaaaaf7da1640)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/thread_cache.h:164
#23 *tcmalloc::ThreadCache::Init* (this=this@entry=0xaaaaf7da1600, 
tid=tid@entry=281472941011088)

at 
/home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/thread_cache.cc:98
....
#34 0x0000ffff98368454 in *kudu::Thread::SuperviseThread* (arg=0xaaaaf861d200)
 at /home/jenkins/workspace/kudu/src/kudu/util/thread.cc:666
#35 0x0000ffff979e9088 in start_thread () from 
/lib/aarch64-linux-gnu/libpthread.so.0
#36 0x0000ffff979594ec in ?? () from /lib/aarch64-linux-gnu/libc.so.6

 

core dump at here >> *static_cast<size_t>(bytes_until_sample_)*
{code:java}
inline bool Sampler::RecordAllocation(size_t k) {
  // The first time we enter this function we expect bytes_until_sample_
  // to be zero, and we must call SampleAllocationSlow() to ensure
  // proper initialization of static vars.
  ASSERT(Static::IsInited() || bytes_until_sample_ == 0);

  // Note that we have to deal with arbitrarily large values of k
  // here. Thus we're upcasting bytes_until_sample_ to unsigned rather
  // than the other way around. And this is why this code cannot be
  // merged with DecrementFast code below.
  if (static_cast<size_t>(bytes_until_sample_) < k) {
    bool result = RecordAllocationSlow(k);
    ASSERT(Static::IsInited());
    return result;
  } else {
    bytes_until_sample_ -= k;
    ASSERT(Static::IsInited());
    return true;
  }
}
{code}
{code:java}
 {code}
 

> debug-util-test failed sometimes on aarch64: Segmentation fault
> ---------------------------------------------------------------
>
>                 Key: KUDU-3096
>                 URL: https://issues.apache.org/jira/browse/KUDU-3096
>             Project: Kudu
>          Issue Type: Sub-task
>            Reporter: huangtianhua
>            Assignee: RuiChen
>            Priority: Major
>
> I test kudu on aarch64 server based on https://gerrit.cloudera.org/#/c/14964/ 
> , the test debug-util-test failed sometimes, please see the detail info of 
> gdb the core dump file: http://paste.openstack.org/show/791306/
> root@ubuntu:/home/jenkins/workspace/kudu/build/debug# ./bin/debug-util-test
> ......
> W0330 07:30:44.317989 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xaaaaf89ed260 after lost signal to thread 28015
> W0330 07:30:44.319747 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xaaaaf89ed280 after lost signal to thread 28015
> W0330 07:30:44.319774 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xaaaaf89ed2a0 after lost signal to thread 28015
> W0330 07:30:44.326023 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xaaaaf89ed2c0 after lost signal to thread 28015
> I0330 07:30:44.336513 27980 debug-util-test.cc:463] Timed out 1410 times
> I0330 07:30:44.336531 27980 debug-util-test.cc:464] Succeeded 13591 times
> [       OK ] DebugUtilTest.TestTimeouts (1002 ms)
> [----------] 9 tests from DebugUtilTest (3049 ms total)
> [----------] 4 tests from DifferentRaces/RaceTest
> [ RUN      ] DifferentRaces/RaceTest.TestStackTraceRaces/0
> Segmentation fault (core dumped)
> root@ubuntu:/home/jenkins/workspace/kudu/build/debug# gdb bin/debug-util-test 
> core.27980
> GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
> Copyright (C) 2018 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "aarch64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from bin/debug-util-test...done.
> [New LWP 28016]
> [New LWP 27980]
> [New LWP 27981]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> Core was generated by `./bin/debug-util-test'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  tcmalloc::Sampler::RecordAllocation (k=<optimized out>, this=<optimized 
> out>)
>     at 
> /home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/sampler.h:166
> 166       if (static_cast<size_t>(bytes_until_sample_) < k) {
> [Current thread is 1 (Thread 0xffff86a9b090 (LWP 28016))]
> Sometimes other tests like TestTimeouts are raise segmentation fault either, 
> the gdb info are same, have no idea it related with gperftools? Maybe someone 
> help us to fix this, thanks very much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to