[ 
https://issues.apache.org/jira/browse/IMPALA-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15929972#comment-15929972
 ] 

Antoni Ivanov edited comment on IMPALA-3875 at 3/17/17 2:00 PM:
----------------------------------------------------------------

Thanks for the help. 
Upgrade for the moment is too costly for us to do quickly (since we do follow 
CDH upgrades which upgrades everything.) 

For reference what we did to managed the issue:

-We find out processes that are polling impalad port 25000 for statics 
(impalad:25000/jsonmetrics?json). We have a few agents (monitoring and haproxy 
agents) and we stopped them. It did occur a few times with Cloudera Manager 
agent but we didn't stop it but simply restarted Impala since we are not sure 
what depends on it. 

- We are suspecting configuration change to align better with Hadoop 
recommendations:  sysctl net.core.somaxconn=128 to 1024 (and changing ifconfig 
eth0 txqueuelen 1000 to 4000) may have worsen things and started causing this 
issue. But we haven't confirmed. 





was (Author: tozka):
Thanks for the help. 
Upgrade for the moment is too costly for us to do quickly (since we do follow 
CDH upgrades which upgrades everything.) 

For reference what we did to managed the issue:

-We find out processes that are polling impalad port 25000 for statics 
(impalad:25000/jsonmetrics?json). We have a few agents (monitoring and haproxy 
agents) and we stopped them. It did occur a few times with Cloudera Manager 
agent but we didn't stop it but simply restarted Impala since we are not sure 
what depends on it. 

- We are suspecting configuration change to align better sysctl 
net.core.somaxconn=128 to 1024 (and changing ifconfig eth0 txqueuelen 1000 to 
4000) may have worsen things and started causing this issue. But we haven't 
confirmed. 




> Thrift threaded server hang in some cases
> -----------------------------------------
>
>                 Key: IMPALA-3875
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3875
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 2.6.0
>            Reporter: Huaisi Xu
>            Assignee: Sailesh Mukil
>            Priority: Blocker
>             Fix For: Impala 2.8.0
>
>
> Hanging looks like this:
> {code:java}
> #0  0x000000398340e82d in read () from 05r/lib64/libpthread.so.0
> #1  0x00000039870dea71 in ?? () from 05r/usr/lib64/libcrypto.so.10
> #2  0x00000039870dcdc9 in BIO_read () from 05r/usr/lib64/libcrypto.so.10
> #3  0x0000003989431873 in ssl23_read_bytes () from 05r/usr/lib64/libssl.so.10
> #4  0x000000398942fe63 in ssl23_get_client_hello () from 
> 05r/usr/lib64/libssl.so.10
> #5  0x00000039894302f3 in ssl23_accept () from 05r/usr/lib64/libssl.so.10
> #6  0x00000000015ee4bc in 
> apache::thrift::transport::TSSLSocket::checkHandshake (this=0xf317b00) at 
> src/thrift/transport/TSSLSocket.cpp:228
> #7  0x00000000015ee820 in apache::thrift::transport::TSSLSocket::read 
> (this=0xf317b00, buf=0x7f8a9ea750a0 "@S\247\236\212\177", len=5) at 
> src/thrift/transport/TSSLSocket.cpp:164
> #8  0x00000000015ebc4f in 
> apache::thrift::transport::readAll<apache::thrift::transport::TSocket> 
> (trans=..., buf=0x7f8a9ea750a0 "@S\247\236\212\177", len=5) at 
> src/thrift/transport/TTransport.h:39
> #9  0x0000000000a80228 in apache::thrift::transport::TTransport::readAll 
> (len=5, buf=0x7f8a9ea750a0 "@S\247\236\212\177", this=<optimized out>) at 
> /usr/src/debug/impala-2.3.0-cdh5.5.2/thirdparty/thrift-0.9.0/build/include/thrift/transport/TTransport.h:126
> #10 apache::thrift::transport::TSaslTransport::receiveSaslMessage 
> (this=0xb6a0770, status=0x7f8a9ea752e4, length=0x7f8a9ea752e8) at 
> /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/transport/TSaslTransport.cpp:237
> #11 0x0000000000a7dc84 in 
> apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage 
> (this=0xb6a0770) at 
> /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/transport/TSaslServerTransport.cpp:80
> #12 0x0000000000a8075e in apache::thrift::transport::TSaslTransport::open 
> (this=0xb6a0770) at 
> /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/transport/TSaslTransport.cpp:95
> #13 0x0000000000a7e9c1 in 
> apache::thrift::transport::TSaslServerTransport::Factory::getTransport 
> (this=0xd0edcb0, trans=...) at 
> /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/transport/TSaslServerTransport.cpp:145
> #14 0x00000000015f6f78 in apache::thrift::server::TThreadedServer::serve 
> (this=0xc181420) at src/thrift/server/TThreadedServer.cpp:162
> #15 0x000000000095149c in 
> impala::ThriftServer::ThriftServerEventProcessor::Supervise (this=<optimized 
> out>) at /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/rpc/thrift-server.cc:173
> #16 0x0000000000ae0faa in boost::function0<void>::operator() (this=<optimized 
> out>) at 
> /opt/toolchain/boost-pic-1.55.0/include/boost/function/function_template.hpp:767
> #17 impala::Thread::SuperviseThread(std::string const&, std::string const&, 
> boost::function<void ()>, impala::Promise<long>*) (name=..., category=..., 
> functor=..., thread_started=0x7fff9af4ca60) at 
> /usr/src/debug/impala-2.3.0-cdh5.5.2/be/src/util/thread.cc:314
> #18 0x0000000000ae3250 in 
> boost::_bi::list4<boost::_bi::value<std::basic_string<char, 
> std::char_traits<char>, std::allocator<char> > >, 
> boost::_bi::value<std::basic_string<char, std::char_traits<char>, 
> std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, 
> boost::_bi::value<impala::Promise<long int>*> >::operator()<void (*)(const 
> std::string&, const std::string&, impala::Thread::ThreadFunctor, 
> impala::Promise<long int>*), boost::_bi::list0> (a=...,
>     f=@0xc3747b8: 0xae0df0 <impala::Thread::SuperviseThread(std::string 
> const&, std::string const&, boost::function<void ()>, 
> impala::Promise<long>*)>, this=0xc3747c0) at 
> /opt/toolchain/boost-pic-1.55.0/include/boost/bind/bind.hpp:457
> #19 boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, 
> boost::function<void ()>, impala::Promise<long>*), 
> boost::_bi::list4<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::Promise<long>*> > >::operator()() (this=0xc3747b8) 
> at /opt/toolchain/boost-pic-1.55.0/include/boost/bind/bind_template.hpp:20
> #20 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string 
> const&, std::string const&, boost::function<void ()>, 
> impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, 
> boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, 
> boost::_bi::value<impala::Promise<long>*> > > >::run() (this=0xc374600) at 
> /opt/toolchain/boost-pic-1.55.0/include/boost/thread/detail/thread.hpp:117
> #21 0x0000000000d28c43 in ?? ()
> #22 0x0000003983407aa1 in start_thread () from 05r/lib64/libpthread.so.0
> #23 0x00000039830e893d in clone () from 05r/lib64/libc.so.6
> {code}
> This is very very bad that the whole threaded server thread will hang because 
> it never gets a chance to dispatch the new serving thread by thread->start();
> This impalad becomes zombie..
> From 
> http://github.mtv.cloudera.com/CDH/Impala/blob/cdh5-trunk/be/src/runtime/client-cache.cc#L106-L113
> we should probably set socket timeout before OpenWithRetry().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to