[ 
https://issues.apache.org/jira/browse/HAWQ-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229958#comment-15229958
 ] 

ASF GitHub Bot commented on HAWQ-635:
-------------------------------------

Github user wangzw commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/564#issuecomment-206774175
  
    If the issue is caused by the exception thrown by 
RpcClientImpl::getChannel(). The only chance is that the system run out of 
memory or fail to create thread.  In either case, you will see error message 
"RpcClient failed to create a channel to" in HAWQ log. 
    
    Please refer the code in RpcClientImpl::getChannel()
    
    
        try {
           ...
    
            rc->addRef();
    
            if (!cleaning) {
                cleaning = true;
    
                if (cleaner.joinable()) {
                    cleaner.join();
                }
    
                CREATE_THREAD(cleaner, bind(&RpcClientImpl::clean, this));
            }
        } catch ...
    
    
    
    Isn't it much easier to fix this issue by moving ```rc->addRef();``` to the 
end of the ```try``` block?



> QE process does not exit in libhdfs
> -----------------------------------
>
>                 Key: HAWQ-635
>                 URL: https://issues.apache.org/jira/browse/HAWQ-635
>             Project: Apache HAWQ
>          Issue Type: Bug
>            Reporter: Ming LI
>            Assignee: Lei Chang
>
> The QE process cannot exit. 
> The calling stack is:
> [gpadmin@sdw3 ~]$ pstack 489333
> #0  0x00000033f560ef3d in nanosleep () from /lib64/libpthread.so.0
> #1  0x00007ff75309c74a in boost::this_thread::hiden::sleep_for(timespec 
> const&) () from 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libboost_thread.so.1.53.0
> #2  0x00007ff755b850b8 in Hdfs::Internal::RpcChannelImpl::waitForExit() () 
> from 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1
> #3  0x00007ff755b97eff in Hdfs::Internal::RpcClientImpl::close() () from 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1
> #4  0x00007ff755b98094 in Hdfs::Internal::RpcClientImpl::~RpcClientImpl() () 
> from 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1
> #5  0x0000000000540c59 in boost::detail::shared_count::~shared_count() ()
> #6  0x00000033f52361bd in __cxa_finalize () from /lib64/libc.so.6
> #7  0x00007ff755b04456 in __do_global_dtors_aux () from 
> /data/pulse-agent-data/HAWQ-main-FeatureTest-opt-sanity/product/hawq/./lib/libhdfs3.so.1
> #8  0x0000000000000000 in ?? ()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to