[ https://issues.apache.org/jira/browse/HAWQ-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229958#comment-15229958 ]
ASF GitHub Bot commented on HAWQ-635: ------------------------------------- Github user wangzw commented on the pull request: https://github.com/apache/incubator-hawq/pull/564#issuecomment-206774175 If the issue is caused by the exception thrown by RpcClientImpl::getChannel(). The only chance is that the system run out of memory or fail to create thread. In either case, you will see error message "RpcClient failed to create a channel to" in HAWQ log. Please refer the code in RpcClientImpl::getChannel() try { ... rc->addRef(); if (!cleaning) { cleaning = true; if (cleaner.joinable()) { cleaner.join(); } CREATE_THREAD(cleaner, bind(&RpcClientImpl::clean, this)); } } catch ... Isn't it much easier to fix this issue by moving ```rc->addRef();``` to the end of the ```try``` block? > QE process does not exit in libhdfs > ----------------------------------- > > Key: HAWQ-635 > URL: https://issues.apache.org/jira/browse/HAWQ-635 > Project: Apache HAWQ > Issue Type: Bug > Reporter: Ming LI > Assignee: Lei Chang > > The QE process cannot exit. > The calling stack is: > [gpadmin@sdw3 ~]$ pstack 489333 > #0 0x00000033f560ef3d in nanosleep () from /lib64/libpthread.so.0 > #1 0x00007ff75309c74a in boost::this_thread::hiden::sleep_for(timespec > const&) () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libboost_thread.so.1.53.0 > #2 0x00007ff755b850b8 in Hdfs::Internal::RpcChannelImpl::waitForExit() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #3 0x00007ff755b97eff in Hdfs::Internal::RpcClientImpl::close() () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #4 0x00007ff755b98094 in Hdfs::Internal::RpcClientImpl::~RpcClientImpl() () > from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #5 0x0000000000540c59 in boost::detail::shared_count::~shared_count() () > #6 0x00000033f52361bd in __cxa_finalize () from /lib64/libc.so.6 > #7 0x00007ff755b04456 in __do_global_dtors_aux () from > /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1 > #8 0x0000000000000000 in ?? () -- This message was sent by Atlassian JIRA (v6.3.4#6332)