[jira] [Commented] (KUDU-3096) debug-util-test failed sometimes on aarch64: Segmentation fault

2020-05-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110249#comment-17110249
 ] 

ASF subversion and git services commented on KUDU-3096:
---

Commit 3a8e9c0f20b801f4d81211c67f26b69152547dc9 in kudu's branch 
refs/heads/master from RuiChen
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=3a8e9c0 ]

KUDU-3096: Upgrade libunwind to 1.4.0

libunwind 1.3.1 use malloc in thread stack trace collection,
that cause core dump and potential deadlock between GetThreadStack
and SuperviseThread, GetThreadStack function will try to collect
thread info even if tcmalloc ThreadCache haven't been inited completlly,
the issue happen in ARM64 server. In libunwind 1.4.0, upstream have
fixed malloc issue, so upgrade it.

Change-Id: Icc722cd5e8ed4ed668d279f6ec831e4eeb69f955
Reviewed-on: http://gerrit.cloudera.org:8080/15899
Tested-by: Kudu Jenkins
Reviewed-by: Grant Henke 


> debug-util-test failed sometimes on aarch64: Segmentation fault
> ---
>
> Key: KUDU-3096
> URL: https://issues.apache.org/jira/browse/KUDU-3096
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: huangtianhua
>Assignee: RuiChen
>Priority: Major
>
> I test kudu on aarch64 server based on https://gerrit.cloudera.org/#/c/14964/ 
> , the test debug-util-test failed sometimes, please see the detail info of 
> gdb the core dump file: http://paste.openstack.org/show/791306/
> root@ubuntu:/home/jenkins/workspace/kudu/build/debug# ./bin/debug-util-test
> ..
> W0330 07:30:44.317989 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed260 after lost signal to thread 28015
> W0330 07:30:44.319747 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed280 after lost signal to thread 28015
> W0330 07:30:44.319774 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed2a0 after lost signal to thread 28015
> W0330 07:30:44.326023 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed2c0 after lost signal to thread 28015
> I0330 07:30:44.336513 27980 debug-util-test.cc:463] Timed out 1410 times
> I0330 07:30:44.336531 27980 debug-util-test.cc:464] Succeeded 13591 times
> [   OK ] DebugUtilTest.TestTimeouts (1002 ms)
> [--] 9 tests from DebugUtilTest (3049 ms total)
> [--] 4 tests from DifferentRaces/RaceTest
> [ RUN  ] DifferentRaces/RaceTest.TestStackTraceRaces/0
> Segmentation fault (core dumped)
> root@ubuntu:/home/jenkins/workspace/kudu/build/debug# gdb bin/debug-util-test 
> core.27980
> GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
> Copyright (C) 2018 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "aarch64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from bin/debug-util-test...done.
> [New LWP 28016]
> [New LWP 27980]
> [New LWP 27981]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> Core was generated by `./bin/debug-util-test'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  tcmalloc::Sampler::RecordAllocation (k=, this= out>)
> at 
> /home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/sampler.h:166
> 166   if (static_cast(bytes_until_sample_) < k) {
> [Current thread is 1 (Thread 0x86a9b090 (LWP 28016))]
> Sometimes other tests like TestTimeouts are raise segmentation fault either, 
> the gdb info are same, have no idea it related with gperftools? Maybe someone 
> help us to fix this, thanks very much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3096) debug-util-test failed sometimes on aarch64: Segmentation fault

2020-05-11 Thread RuiChen (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104393#comment-17104393
 ] 

RuiChen commented on KUDU-3096:
---

I fix it in patch [https://gerrit.cloudera.org/#/c/15899/] , and repeat to run 
DebugUtilTest.TestThreadBlockingSignals 1000 times in my vm, all of tests 
passed.

> debug-util-test failed sometimes on aarch64: Segmentation fault
> ---
>
> Key: KUDU-3096
> URL: https://issues.apache.org/jira/browse/KUDU-3096
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: huangtianhua
>Assignee: RuiChen
>Priority: Major
>
> I test kudu on aarch64 server based on https://gerrit.cloudera.org/#/c/14964/ 
> , the test debug-util-test failed sometimes, please see the detail info of 
> gdb the core dump file: http://paste.openstack.org/show/791306/
> root@ubuntu:/home/jenkins/workspace/kudu/build/debug# ./bin/debug-util-test
> ..
> W0330 07:30:44.317989 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed260 after lost signal to thread 28015
> W0330 07:30:44.319747 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed280 after lost signal to thread 28015
> W0330 07:30:44.319774 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed2a0 after lost signal to thread 28015
> W0330 07:30:44.326023 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed2c0 after lost signal to thread 28015
> I0330 07:30:44.336513 27980 debug-util-test.cc:463] Timed out 1410 times
> I0330 07:30:44.336531 27980 debug-util-test.cc:464] Succeeded 13591 times
> [   OK ] DebugUtilTest.TestTimeouts (1002 ms)
> [--] 9 tests from DebugUtilTest (3049 ms total)
> [--] 4 tests from DifferentRaces/RaceTest
> [ RUN  ] DifferentRaces/RaceTest.TestStackTraceRaces/0
> Segmentation fault (core dumped)
> root@ubuntu:/home/jenkins/workspace/kudu/build/debug# gdb bin/debug-util-test 
> core.27980
> GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
> Copyright (C) 2018 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "aarch64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from bin/debug-util-test...done.
> [New LWP 28016]
> [New LWP 27980]
> [New LWP 27981]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> Core was generated by `./bin/debug-util-test'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  tcmalloc::Sampler::RecordAllocation (k=, this= out>)
> at 
> /home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/sampler.h:166
> 166   if (static_cast(bytes_until_sample_) < k) {
> [Current thread is 1 (Thread 0x86a9b090 (LWP 28016))]
> Sometimes other tests like TestTimeouts are raise segmentation fault either, 
> the gdb info are same, have no idea it related with gperftools? Maybe someone 
> help us to fix this, thanks very much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3096) debug-util-test failed sometimes on aarch64: Segmentation fault

2020-05-11 Thread RuiChen (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104376#comment-17104376
 ] 

RuiChen commented on KUDU-3096:
---

I face the same issue in my machine(ubuntu 18.04, aarch64 VM) when I repeat to 
run test case DebugUtilTest.TestThreadBlockingSignals 100 times, I check the 
core dump file, find function *GetThreadStack* try to collect stack info on an 
init uncomplete thread, that cause tcmalloc break in ThreadCache, see core dump 
info given by huangtianhua, you can see, signal handler is called when the 
tcmalloc ThreadCached still init, then start to handle sign to collect thread 
stack info, that will try to access some uninited var.

 

root@ubuntu:/home/jenkins/workspace/kudu/build/debug# gdb bin/debug-util-test 
core.27980
...
(gdb) bt
#0 *tcmalloc::Sampler::RecordAllocation* (k=, this=)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/sampler.h:166
#1 tcmalloc::ThreadCache::SampleAllocation (k=, this=)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/thread_cache.h:489
#2 (anonymous namespace)::do_malloc (size=136)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1360

#7 tc_malloc (size=size@entry=136)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/tcmalloc.cc:1873
#8 0x9764f3dc in load_debug_frame (is_local=, 
bufsize=,
 buf=, file=)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/libunwind-1.3.1/src/dwarf/Gfind_proc_info-lsb.c:127
...
#18 0x9764b6c8 in _ULaarch64_step (cursor=0x86a96f10)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/libunwind-1.3.1/src/aarch64/Gstep.c:146
#19 0x9824a4b4 in kudu::StackTrace::Collect (this=0xdbaad4a0, 
skip_frames=0)
 at /home/jenkins/workspace/kudu/src/kudu/util/debug-util.cc:615
#20 0x98248c48 in kudu::(anonymous namespace)::HandleStackTraceSignal 
(info=0x86a991b0)
 at /home/jenkins/workspace/kudu/src/kudu/util/debug-util.cc:289
#21 
#22 tcmalloc::ThreadCache::FreeList::Init (size=16, this=0xf7da1640)
 at 
/home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/thread_cache.h:164
#23 *tcmalloc::ThreadCache::Init* (this=this@entry=0xf7da1600, 
tid=tid@entry=281472941011088)

at 
/home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/thread_cache.cc:98

#34 0x98368454 in *kudu::Thread::SuperviseThread* (arg=0xf861d200)
 at /home/jenkins/workspace/kudu/src/kudu/util/thread.cc:666
#35 0x979e9088 in start_thread () from 
/lib/aarch64-linux-gnu/libpthread.so.0
#36 0x979594ec in ?? () from /lib/aarch64-linux-gnu/libc.so.6

 

core dump at here >> *static_cast(bytes_until_sample_)*
{code:java}
inline bool Sampler::RecordAllocation(size_t k) {
  // The first time we enter this function we expect bytes_until_sample_
  // to be zero, and we must call SampleAllocationSlow() to ensure
  // proper initialization of static vars.
  ASSERT(Static::IsInited() || bytes_until_sample_ == 0);

  // Note that we have to deal with arbitrarily large values of k
  // here. Thus we're upcasting bytes_until_sample_ to unsigned rather
  // than the other way around. And this is why this code cannot be
  // merged with DecrementFast code below.
  if (static_cast(bytes_until_sample_) < k) {
bool result = RecordAllocationSlow(k);
ASSERT(Static::IsInited());
return result;
  } else {
bytes_until_sample_ -= k;
ASSERT(Static::IsInited());
return true;
  }
}
{code}
{code:java}
 {code}
 

> debug-util-test failed sometimes on aarch64: Segmentation fault
> ---
>
> Key: KUDU-3096
> URL: https://issues.apache.org/jira/browse/KUDU-3096
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: huangtianhua
>Assignee: RuiChen
>Priority: Major
>
> I test kudu on aarch64 server based on https://gerrit.cloudera.org/#/c/14964/ 
> , the test debug-util-test failed sometimes, please see the detail info of 
> gdb the core dump file: http://paste.openstack.org/show/791306/
> root@ubuntu:/home/jenkins/workspace/kudu/build/debug# ./bin/debug-util-test
> ..
> W0330 07:30:44.317989 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed260 after lost signal to thread 28015
> W0330 07:30:44.319747 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed280 after lost signal to thread 28015
> W0330 07:30:44.319774 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed2a0 after lost signal to thread 28015
> W0330 07:30:44.326023 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed2c0 after lost signal to thread 28015
> I0330 07:30:44.336513 27980 debug-util-test.cc:463] Timed out 1410 times
> I0330 07:30:44.336531 27980 debug-util-test.cc:464] Succeeded 13591 tim

[jira] [Commented] (KUDU-3096) debug-util-test failed sometimes on aarch64: Segmentation fault

2020-04-01 Thread huangtianhua (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072477#comment-17072477
 ] 

huangtianhua commented on KUDU-3096:


[~adar], I add a patch to add a patch of atomicops operation of arm64 for 
gperftools, but it didn't work. [~tlipcon], could you have a look for this? 
Thanks.

> debug-util-test failed sometimes on aarch64: Segmentation fault
> ---
>
> Key: KUDU-3096
> URL: https://issues.apache.org/jira/browse/KUDU-3096
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: huangtianhua
>Priority: Major
>
> I test kudu on aarch64 server based on https://gerrit.cloudera.org/#/c/14964/ 
> , the test debug-util-test failed sometimes, please see the detail info of 
> gdb the core dump file: http://paste.openstack.org/show/791306/
> root@ubuntu:/home/jenkins/workspace/kudu/build/debug# ./bin/debug-util-test
> ..
> W0330 07:30:44.317989 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed260 after lost signal to thread 28015
> W0330 07:30:44.319747 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed280 after lost signal to thread 28015
> W0330 07:30:44.319774 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed2a0 after lost signal to thread 28015
> W0330 07:30:44.326023 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed2c0 after lost signal to thread 28015
> I0330 07:30:44.336513 27980 debug-util-test.cc:463] Timed out 1410 times
> I0330 07:30:44.336531 27980 debug-util-test.cc:464] Succeeded 13591 times
> [   OK ] DebugUtilTest.TestTimeouts (1002 ms)
> [--] 9 tests from DebugUtilTest (3049 ms total)
> [--] 4 tests from DifferentRaces/RaceTest
> [ RUN  ] DifferentRaces/RaceTest.TestStackTraceRaces/0
> Segmentation fault (core dumped)
> root@ubuntu:/home/jenkins/workspace/kudu/build/debug# gdb bin/debug-util-test 
> core.27980
> GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
> Copyright (C) 2018 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "aarch64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from bin/debug-util-test...done.
> [New LWP 28016]
> [New LWP 27980]
> [New LWP 27981]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> Core was generated by `./bin/debug-util-test'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  tcmalloc::Sampler::RecordAllocation (k=, this= out>)
> at 
> /home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/sampler.h:166
> 166   if (static_cast(bytes_until_sample_) < k) {
> [Current thread is 1 (Thread 0x86a9b090 (LWP 28016))]
> Sometimes other tests like TestTimeouts are raise segmentation fault either, 
> the gdb info are same, have no idea it related with gperftools? Maybe someone 
> help us to fix this, thanks very much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3096) debug-util-test failed sometimes on aarch64: Segmentation fault

2020-03-30 Thread Adar Dembo (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071181#comment-17071181
 ] 

Adar Dembo commented on KUDU-3096:
--

[~tlipcon] any ideas here? I remember the stacktrace collection code has a 
bunch of gnarly optimizations built-in; are any x86-specific?

> debug-util-test failed sometimes on aarch64: Segmentation fault
> ---
>
> Key: KUDU-3096
> URL: https://issues.apache.org/jira/browse/KUDU-3096
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: huangtianhua
>Priority: Major
>
> I test kudu on aarch64 server based on https://gerrit.cloudera.org/#/c/14964/ 
> , the test debug-util-test failed sometimes, please see the detail info of 
> gdb the core dump file: http://paste.openstack.org/show/791306/
> root@ubuntu:/home/jenkins/workspace/kudu/build/debug# ./bin/debug-util-test
> ..
> W0330 07:30:44.317989 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed260 after lost signal to thread 28015
> W0330 07:30:44.319747 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed280 after lost signal to thread 28015
> W0330 07:30:44.319774 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed2a0 after lost signal to thread 28015
> W0330 07:30:44.326023 27980 debug-util.cc:405] Leaking SignalData structure 
> 0xf89ed2c0 after lost signal to thread 28015
> I0330 07:30:44.336513 27980 debug-util-test.cc:463] Timed out 1410 times
> I0330 07:30:44.336531 27980 debug-util-test.cc:464] Succeeded 13591 times
> [   OK ] DebugUtilTest.TestTimeouts (1002 ms)
> [--] 9 tests from DebugUtilTest (3049 ms total)
> [--] 4 tests from DifferentRaces/RaceTest
> [ RUN  ] DifferentRaces/RaceTest.TestStackTraceRaces/0
> Segmentation fault (core dumped)
> root@ubuntu:/home/jenkins/workspace/kudu/build/debug# gdb bin/debug-util-test 
> core.27980
> GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
> Copyright (C) 2018 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "aarch64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from bin/debug-util-test...done.
> [New LWP 28016]
> [New LWP 27980]
> [New LWP 27981]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
> Core was generated by `./bin/debug-util-test'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  tcmalloc::Sampler::RecordAllocation (k=, this= out>)
> at 
> /home/jenkins/workspace/kudu/thirdparty/src/gperftools-2.6.90/src/sampler.h:166
> 166   if (static_cast(bytes_until_sample_) < k) {
> [Current thread is 1 (Thread 0x86a9b090 (LWP 28016))]
> Sometimes other tests like TestTimeouts are raise segmentation fault either, 
> the gdb info are same, have no idea it related with gperftools? Maybe someone 
> help us to fix this, thanks very much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)