[ 
https://issues.apache.org/jira/browse/KUDU-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-517.
------------------------------
       Resolution: Cannot Reproduce
    Fix Version/s: n/a

We upgraded to gperftools 2.2 a long time ago, and also stopped using HEAPCHECK 
builds (now using ASAN's leaksanitizer instead). I haven't seen this issue in 
years

> SIGSEGV in gperftools GetStackTrace
> -----------------------------------
>
>                 Key: KUDU-517
>                 URL: https://issues.apache.org/jira/browse/KUDU-517
>             Project: Kudu
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: M4.5
>            Reporter: Adar Dembo
>             Fix For: n/a
>
>
> Sometimes, during a LEAKCHECK build, the TS gets a SIGSEGV from within 
> tcmalloc's GetStackTrace() call. For example, here it is in a Java unit test:
> {noformat}
> 2014-09-23 15:42:05,296 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>  *** Aborted at 1411512125 (unix time) try "date -d @1411512125" if you are 
> using GNU date ***
> 2014-09-23 15:42:05,298 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>  PC: @           0x6ed17f GetStackTrace()
> 2014-09-23 15:42:05,298 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>  *** SIGSEGV (@0x7faf9d010008) received by PID 28759 (TID 0x7faf9d00f700) 
> from PID 18446744072048672776; stack trace: ***
> 2014-09-23 15:42:05,299 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @       0x32aae0f500 (unknown)
> 2014-09-23 15:42:05,300 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @           0x6ed17f GetStackTrace()
> 2014-09-23 15:42:05,301 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @           0x6e70c9 MallocHook_GetCallerStackTrace
> 2014-09-23 15:42:05,301 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @           0x6f01c6 NewHook()
> 2014-09-23 15:42:05,304 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @           0x6e6bf6 MallocHook::InvokeNewHookSlow()
> 2014-09-23 15:42:05,305 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @          0x119eca3 tc_new
> 2014-09-23 15:42:05,305 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @       0x32ad29c3c9 (unknown)
> 2014-09-23 15:42:05,305 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @       0x32ad29cde5 (unknown)
> 2014-09-23 15:42:05,306 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @       0x32ad29cf33 (unknown)
> 2014-09-23 15:42:05,308 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @           0x822c32 kudu::log::Log::CreatePlaceholderSegment()
> 2014-09-23 15:42:05,311 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @           0x823042 kudu::log::Log::PreAllocateNewSegment()
> 2014-09-23 15:42:05,315 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @           0x82ae43 kudu::log::Log::SegmentAllocationTask::Run()
> 2014-09-23 15:42:05,318 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @          0x10f125c kudu::FutureTask::Run()
> 2014-09-23 15:42:05,321 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @          0x10f879f kudu::ThreadPool::DispatchThread()
> 2014-09-23 15:42:05,323 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @          0x10f571f kudu::Thread::SuperviseThread()
> 2014-09-23 15:42:05,324 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @       0x32aae07851 (unknown)
> 2014-09-23 15:42:05,325 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @       0x32aaae894d (unknown)
> 2014-09-23 15:42:05,325 (kudu-tablet_server) [INFO - 
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
>      @                0x0 (unknown)
> {noformat}
> And here's another example from linked_list-test:
> {noformat}
>     *** Aborted at 1405495265 (unix time) try "date -d @1405495265" if you 
> are using GNU date ***
>     PC: @           0x72032f GetStackTrace()
>     *** SIGSEGV (@0x7fffe9d08d08) received by PID 9817 (TID 0x7fc3a5bce800) 
> from PID 18446744073337343240; stack trace: ***
>         @       0x3918c0f4a0 (unknown) at ??:0
>         @           0x72032f GetStackTrace() at 
> /home/adar/kudu/thirdparty/gperftools-2.1/src/stacktrace_x86-inl.h:325
>         @           0x71a279 MallocHook_GetCallerStackTrace at 
> /home/adar/kudu/thirdparty/gperftools-2.1/src/malloc_hook.cc:666
>         @           0x723376 NewHook() at 
> /home/adar/kudu/thirdparty/gperftools-2.1/src/heap-checker.cc:575
>         @           0x719da6 MallocHook::InvokeNewHookSlow() at 
> /home/adar/kudu/thirdparty/gperftools-2.1/src/malloc_hook.cc:525
>         @           0xa4ed43 tc_new at 
> /home/adar/kudu/thirdparty/gperftools-2.1/src/tcmalloc.cc:1607
>         @       0x30d0c9c3c9 (unknown) at ??:0
>         @       0x30d0c9cde5 (unknown) at ??:0
>         @       0x30d0c9cf33 (unknown) at ??:0
>         @           0x77afb4 
> kudu::master::MasterServiceProxy::MasterServiceProxy() at 
> /home/adar/kudu/src/master/master.proxy.cc:16
>         @           0x6e0eff kudu::ExternalMiniCluster::master_proxy() at 
> /home/adar/kudu/src/integration-tests/external_mini_cluster.cc:231
>         @           0x6df2f7 
> kudu::ExternalMiniCluster::WaitForTabletServerCount() at 
> /home/adar/kudu/src/integration-tests/external_mini_cluster.cc:204
>         @           0x6de6f9 kudu::ExternalMiniCluster::Start() at 
> /home/adar/kudu/src/integration-tests/external_mini_cluster.cc:117
>         @           0x6cd5d5 kudu::LinkedListTest::RestartCluster() at 
> /home/adar/kudu/src/integration-tests/linked_list-test.cc:120
>         @           0x6c98e8 
> kudu::LinkedListTest_TestLoadAndVerify_Test::TestBody() at 
> /home/adar/kudu/src/integration-tests/linked_list-test.cc:483
> {noformat}
> Both Todd and I tried to fix this by switching gperftools over to using 
> libunwind for stack traces. Both times a bunch of unit tests slowed down 
> considerably, leading us to abandon our efforts. You can see the gerrits here:
> http://gerrit.sjc.cloudera.com:8080/#/c/3477/
> http://gerrit.sjc.cloudera.com:8080/#/c/3520/
> Todd said that Impala has a hack to make the gperftools unwinder less 
> dangerous. That might be worth exploring. Or perhaps it's a known issue fixed 
> in newer versions of gperftools. 
> https://code.google.com/p/gperftools/issues/detail?id=66 and 
> https://code.google.com/p/gperftools/issues/detail?id=547 seem related, and 
> were fixed as part of gperftools-2.2 (we're still on 2.1).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to