[
https://issues.apache.org/jira/browse/KUDU-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon resolved KUDU-517.
------------------------------
Resolution: Cannot Reproduce
Fix Version/s: n/a
We upgraded to gperftools 2.2 a long time ago, and also stopped using HEAPCHECK
builds (now using ASAN's leaksanitizer instead). I haven't seen this issue in
years
> SIGSEGV in gperftools GetStackTrace
> -----------------------------------
>
> Key: KUDU-517
> URL: https://issues.apache.org/jira/browse/KUDU-517
> Project: Kudu
> Issue Type: Bug
> Components: tserver
> Affects Versions: M4.5
> Reporter: Adar Dembo
> Fix For: n/a
>
>
> Sometimes, during a LEAKCHECK build, the TS gets a SIGSEGV from within
> tcmalloc's GetStackTrace() call. For example, here it is in a Java unit test:
> {noformat}
> 2014-09-23 15:42:05,296 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> *** Aborted at 1411512125 (unix time) try "date -d @1411512125" if you are
> using GNU date ***
> 2014-09-23 15:42:05,298 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> PC: @ 0x6ed17f GetStackTrace()
> 2014-09-23 15:42:05,298 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> *** SIGSEGV (@0x7faf9d010008) received by PID 28759 (TID 0x7faf9d00f700)
> from PID 18446744072048672776; stack trace: ***
> 2014-09-23 15:42:05,299 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x32aae0f500 (unknown)
> 2014-09-23 15:42:05,300 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x6ed17f GetStackTrace()
> 2014-09-23 15:42:05,301 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x6e70c9 MallocHook_GetCallerStackTrace
> 2014-09-23 15:42:05,301 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x6f01c6 NewHook()
> 2014-09-23 15:42:05,304 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x6e6bf6 MallocHook::InvokeNewHookSlow()
> 2014-09-23 15:42:05,305 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x119eca3 tc_new
> 2014-09-23 15:42:05,305 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x32ad29c3c9 (unknown)
> 2014-09-23 15:42:05,305 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x32ad29cde5 (unknown)
> 2014-09-23 15:42:05,306 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x32ad29cf33 (unknown)
> 2014-09-23 15:42:05,308 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x822c32 kudu::log::Log::CreatePlaceholderSegment()
> 2014-09-23 15:42:05,311 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x823042 kudu::log::Log::PreAllocateNewSegment()
> 2014-09-23 15:42:05,315 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x82ae43 kudu::log::Log::SegmentAllocationTask::Run()
> 2014-09-23 15:42:05,318 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x10f125c kudu::FutureTask::Run()
> 2014-09-23 15:42:05,321 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x10f879f kudu::ThreadPool::DispatchThread()
> 2014-09-23 15:42:05,323 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x10f571f kudu::Thread::SuperviseThread()
> 2014-09-23 15:42:05,324 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x32aae07851 (unknown)
> 2014-09-23 15:42:05,325 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x32aaae894d (unknown)
> 2014-09-23 15:42:05,325 (kudu-tablet_server) [INFO -
> kudu.rpc.BaseKuduTest$ProcessInputStreamLogPrinterRunnable.run(BaseKuduTest.java:304)]
> @ 0x0 (unknown)
> {noformat}
> And here's another example from linked_list-test:
> {noformat}
> *** Aborted at 1405495265 (unix time) try "date -d @1405495265" if you
> are using GNU date ***
> PC: @ 0x72032f GetStackTrace()
> *** SIGSEGV (@0x7fffe9d08d08) received by PID 9817 (TID 0x7fc3a5bce800)
> from PID 18446744073337343240; stack trace: ***
> @ 0x3918c0f4a0 (unknown) at ??:0
> @ 0x72032f GetStackTrace() at
> /home/adar/kudu/thirdparty/gperftools-2.1/src/stacktrace_x86-inl.h:325
> @ 0x71a279 MallocHook_GetCallerStackTrace at
> /home/adar/kudu/thirdparty/gperftools-2.1/src/malloc_hook.cc:666
> @ 0x723376 NewHook() at
> /home/adar/kudu/thirdparty/gperftools-2.1/src/heap-checker.cc:575
> @ 0x719da6 MallocHook::InvokeNewHookSlow() at
> /home/adar/kudu/thirdparty/gperftools-2.1/src/malloc_hook.cc:525
> @ 0xa4ed43 tc_new at
> /home/adar/kudu/thirdparty/gperftools-2.1/src/tcmalloc.cc:1607
> @ 0x30d0c9c3c9 (unknown) at ??:0
> @ 0x30d0c9cde5 (unknown) at ??:0
> @ 0x30d0c9cf33 (unknown) at ??:0
> @ 0x77afb4
> kudu::master::MasterServiceProxy::MasterServiceProxy() at
> /home/adar/kudu/src/master/master.proxy.cc:16
> @ 0x6e0eff kudu::ExternalMiniCluster::master_proxy() at
> /home/adar/kudu/src/integration-tests/external_mini_cluster.cc:231
> @ 0x6df2f7
> kudu::ExternalMiniCluster::WaitForTabletServerCount() at
> /home/adar/kudu/src/integration-tests/external_mini_cluster.cc:204
> @ 0x6de6f9 kudu::ExternalMiniCluster::Start() at
> /home/adar/kudu/src/integration-tests/external_mini_cluster.cc:117
> @ 0x6cd5d5 kudu::LinkedListTest::RestartCluster() at
> /home/adar/kudu/src/integration-tests/linked_list-test.cc:120
> @ 0x6c98e8
> kudu::LinkedListTest_TestLoadAndVerify_Test::TestBody() at
> /home/adar/kudu/src/integration-tests/linked_list-test.cc:483
> {noformat}
> Both Todd and I tried to fix this by switching gperftools over to using
> libunwind for stack traces. Both times a bunch of unit tests slowed down
> considerably, leading us to abandon our efforts. You can see the gerrits here:
> http://gerrit.sjc.cloudera.com:8080/#/c/3477/
> http://gerrit.sjc.cloudera.com:8080/#/c/3520/
> Todd said that Impala has a hack to make the gperftools unwinder less
> dangerous. That might be worth exploring. Or perhaps it's a known issue fixed
> in newer versions of gperftools.
> https://code.google.com/p/gperftools/issues/detail?id=66 and
> https://code.google.com/p/gperftools/issues/detail?id=547 seem related, and
> were fixed as part of gperftools-2.2 (we're still on 2.1).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)