[ https://issues.apache.org/jira/browse/HAWQ-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lei Chang closed HAWQ-252. -------------------------- > Coredump When RM Reconnect libyarn > ---------------------------------- > > Key: HAWQ-252 > URL: https://issues.apache.org/jira/browse/HAWQ-252 > Project: Apache HAWQ > Issue Type: Bug > Components: Resource Manager > Reporter: Lin Wen > Assignee: Lin Wen > Fix For: 2.0.0 > > > Coredump When RM Reconnect libyarn > Missing separate debuginfos, use: debuginfo-install > hawq-2.0.0.0_beta-19011.x86_64 > (gdb) bt > #0 0x0000000000e661f8 in std::string::_Rep::_S_empty_rep_storage () > #1 0x00007f7f1f20947c in libyarn::LibYarnClient::dummyAllocate (this=<value > optimized out>) > at > /data1/pulse2-agent/agents/agent1/work/LIBYARN-main-opt/rhel5_x86_64/src/libyarnclient/LibYarnClient.cpp:330 > #2 0x00007f7f1f209988 in libyarn::heartbeatFunc (args=<value optimized out>) > at > /data1/pulse2-agent/agents/agent1/work/LIBYARN-main-opt/rhel5_x86_64/src/libyarnclient/LibYarnClient.cpp:114 > #3 0x000000350b4079d1 in start_thread () from /lib64/libpthread.so.0 > #4 0x000000350b0e8b6d in clone () from /lib64/libc.so.6 > (gdb) info thread > 4 Thread 0x7f7efc239700 (LWP 760442) 0x000000350b40b98e in > pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > 3 Thread 0x7f7f1a1758c0 (LWP 760441) 0x000000350b0accdd in nanosleep () > from /lib64/libc.so.6 > 2 Thread 0x7f7efae37700 (LWP 760797) 0x000000350b0accdd in nanosleep () > from /lib64/libc.so.6 > * 1 Thread 0x7f7efb838700 (LWP 760443) 0x0000000000e661f8 in > std::string::_Rep::_S_empty_rep_storage () > (gdb) thread 2 > [Switching to thread 2 (Thread 0x7f7efae37700 (LWP 760797))]#0 > 0x000000350b0accdd in nanosleep () from /lib64/libc.so.6 > (gdb) bt > #0 0x000000350b0accdd in nanosleep () from /lib64/libc.so.6 > #1 0x000000350b0e1e54 in usleep () from /lib64/libc.so.6 > #2 0x00007f7f1f209999 in libyarn::heartbeatFunc (args=<value optimized out>) > at > /data1/pulse2-agent/agents/agent1/work/LIBYARN-main-opt/rhel5_x86_64/src/libyarnclient/LibYarnClient.cpp:131 > #3 0x000000350b4079d1 in start_thread () from /lib64/libpthread.so.0 > #4 0x000000350b0e8b6d in clone () from /lib64/libc.so.6 > (gdb) thread 3 > [Switching to thread 3 (Thread 0x7f7f1a1758c0 (LWP 760441))]#0 > 0x000000350b0accdd in nanosleep () from /lib64/libc.so.6 > (gdb) bt > #0 0x000000350b0accdd in nanosleep () from /lib64/libc.so.6 > #1 0x000000350b0e1e54 in usleep () from /lib64/libc.so.6 > #2 0x00000000008dd8b9 in RB2YARN_registerYARNApplication () at > resourcebroker_LIBYARN_proc.c:1354 > #3 0x00000000008df8ad in RB2YARN_initializeConnection () at > resourcebroker_LIBYARN_proc.c:1270 > #4 0x00000000008dfc93 in ResBrokerMainInternal () at > resourcebroker_LIBYARN_proc.c:202 > #5 0x00000000008dff79 in ResBrokerMain () at > resourcebroker_LIBYARN_proc.c:157 > #6 0x00000000008dc246 in RB_LIBYARN_start (isforked=<value optimized out>) > at resourcebroker_LIBYARN.c:153 > #7 0x0000000000903bda in MainHandlerLoop () at resourcemanager.c:531 > #8 0x00000000009041f1 in ResManagerMainServer2ndPhase () at > resourcemanager.c:508 > #9 0x0000000000904624 in ResManagerMain (argc=<value optimized out>, > argv=<value optimized out>) at resourcemanager.c:330 > #10 0x00000000009049b1 in ResManagerProcessStartup () at resourcemanager.c:402 > #11 0x0000000000764b08 in CommenceNormalOperations () at postmaster.c:3616 > #12 0x00000000007659c2 in do_reaper () at postmaster.c:3964 > #13 0x000000000076a01d in ServerLoop () at postmaster.c:2102 > #14 0x000000000076bb5e in PostmasterMain (argc=9, argv=0x32a15b0) at > postmaster.c:1421 > #15 0x00000000006c691a in main (argc=9, argv=0x32a1570) at main.c:226 > There are two heartbeat thread at this moment, which means one heartbeat > thread hasn't be canceled when RM reconnects libyarn. > In function ResBrokerMainInternal(), from line:270, should cancel the > heartbeat thread before call RB2YARN_disconnectFromYARN -- This message was sent by Atlassian JIRA (v6.3.4#6332)