[ https://issues.apache.org/jira/browse/IMPALA-11653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kurt Deschler reassigned IMPALA-11653: -------------------------------------- Assignee: Fang-Yu Rao (was: Qifan Chen) > Identify and time out connections that are not from a supported Impala client > more eagerly > ------------------------------------------------------------------------------------------ > > Key: IMPALA-11653 > URL: https://issues.apache.org/jira/browse/IMPALA-11653 > Project: IMPALA > Issue Type: Improvement > Affects Versions: Impala 4.1.0 > Reporter: Vincent Tran > Assignee: Fang-Yu Rao > Priority: Major > Attachments: simple_tcp_client.py > > > When a tcp client opens a connection to an Impala client interface (hs2 or > beeswax), the connection is accepted immediately after the 3-way handshake > (SYN, SYN-ACK, ACK) and is queued forĀ > *TAcceptQueueServer::SetupConnection()*. However, if the client sends > nothing else, the ImpalaServer will block in > *apache::thrift::transport::TSocket::read()* until the client sends a RST/FIN > or until *sasl_connect_tcp_timeout_ms* elapses (which is by default, 5 > minutes). > The connection setup thread stack trace can be observed below during this > period. > {noformat} > (gdb) bt > #0 0x00007f3b972ee20d in poll () from ./lib64/libc.so.6 > #1 0x0000000002dcd5bc in apache::thrift::transport::TSocket::read(unsigned > char*, unsigned int) () > #2 0x0000000002dd1803 in unsigned int > apache::thrift::transport::readAll<apache::thrift::transport::TSocket>(apache::thrift::transport::TSocket&, > unsigned char*, unsigned int) () > #3 0x0000000001330cc9 in readAll (len=5, buf=0x7f3277ea4f8b "", > this=<optimized out>) at > ../../../toolchain/toolchain-packages-gcc7.5.0/thrift-0.9.3-p8/include/thrift/transport/TTransport.h:121 > #4 apache::thrift::transport::TSaslTransport::receiveSaslMessage > (this=this@entry=0x278a96b0, status=status@entry=0x7f3277ea500c, > length=length@entry=0x7f3277ea5008) at TSaslTransport.cpp:259 > #5 0x000000000132db14 in > apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage > (this=0x278a96b0) at TSaslServerTransport.cpp:95 > #6 0x0000000001330e33 in > apache::thrift::transport::TSaslTransport::doSaslNegotiation > (this=0x278a96b0) at TSaslTransport.cpp:81 > #7 0x000000000132e723 in open (this=0x12e29750) at > ../../../toolchain/toolchain-packages-gcc7.5.0/thrift-0.9.3-p8/include/thrift/transport/TBufferTransports.h:218 > #8 apache::thrift::transport::TSaslServerTransport::Factory::getTransport > (this=0xf825a70, trans=...) at TSaslServerTransport.cpp:173 > #9 0x00000000010cd49d in > apache::thrift::server::TAcceptQueueServer::SetupConnection (this=0x174270c0, > entry=...) at TAcceptQueueServer.cpp:233 > #10 0x00000000010cef4d in operator() (tid=<optimized out>, item=..., > __closure=<optimized out>) at TAcceptQueueServer.cpp:323 > #11 > boost::detail::function::void_function_obj_invoker2<apache::thrift::server::TAcceptQueueServer::serve()::<lambda(int, > const boost::shared_ptr<apache::thrift::server::TAcceptQueueEntry>&)>, void, > int, const > boost::shared_ptr<apache::thrift::server::TAcceptQueueEntry>&>::invoke(boost::detail::function::function_buffer > &, int, const boost::shared_ptr<apache::thrift::server::TAcceptQueueEntry> > &) (function_obj_ptr=..., a0=<optimized out>, a1=...) > at > ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159 > #12 0x00000000010d3e59 in operator() (a1=..., a0=1, this=0x7f3279ea9510) at > ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770 > #13 > impala::ThreadPool<boost::shared_ptr<apache::thrift::server::TAcceptQueueEntry> > >::WorkerThread (this=0x7f3279ea94c0, thread_id=1) at > ../util/thread-pool.h:166 > #14 0x000000000144f8f2 in operator() (this=0x7f3277ea5b40) at > ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770 > #15 impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&, > std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, > impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=..., > functor=..., parent_thread_info=<optimized out>, > thread_started=0x7f3279ea9110) at thread.cc:360 > #16 0x0000000001450d6b in operator()<void (*)(const > std::__cxx11::basic_string<char>&, const std::__cxx11::basic_string<char>&, > boost::function<void()>, const impala::ThreadDebugInfo*, impala::Promise<long > int>*), boost::_bi::list0> (a=<synthetic pointer>, > f=@0x1417ccf8: 0x144f5f0 > <impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&, > std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, > impala::Promise<long, (impala::PromiseMode)0>*)>, this=0x1417cd00) at > ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531 > #17 operator() (this=0x1417ccf8) at > ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222 > #18 boost::detail::thread_data<boost::_bi::bind_t<void, void > (*)(std::__cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> > const&, std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&, boost::function<void > ()>, impala::ThreadDebugInfo const*, impala::Promise<long, > (impala::PromiseMode)0>*), > boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > >, > boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, > boost::_bi::value<impala::ThreadDebugInfo*>, > boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > > >::run() (this=0x1417cb40) > at > ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116 > #19 0x0000000001ca17c2 in thread_proxy () > #20 0x00007f3b9a7f2dd5 in start_thread () from ./lib64/libpthread.so.0 > #21 0x00007f3b972f8ead in clone () from ./lib64/libc.so.6 > {noformat} > As discussed in IMPALA-7638, we need to distinguish between a client that is > slow to finish sasl negotiation (due to slow Kerberos negotiation) from a > client that is never going to do anything beyond the tcp handshake and time > these out much sooner than the *sasl_connect_tcp_timeout_ms* duration. > The logging pattern below captures some instances of these abnormal > connections > {noformat} > I1007 12:16:07.636166 185038 TAcceptQueueServer.cpp:227] TAcceptQueueServer: > hiveserver2-frontend started connection setup for client <Host: x.x.x.x Port: > 32878> > I1007 12:21:06.634896 185038 thrift-util.cc:96] TAcceptQueueServer: > hiveserver2-frontend connection setup failed for client <Host: x.x.x.x Port: > 32878>. Caught TException: No more data to read. > ==== > I1007 12:16:23.488011 185039 TAcceptQueueServer.cpp:227] TAcceptQueueServer: > hiveserver2-frontend started connection setup for client <Host: x.x.x.x Port: > 33934> > I1007 12:21:22.488610 185039 thrift-util.cc:96] TAcceptQueueServer: > hiveserver2-frontend connection setup failed for client <Host: x.x.x.x Port: > 33934>. Caught TException: No more data to read. > {noformat} > This instance of ImpalaServer was running with > *accepted_cnxn_setup_thread_pool_size*=2. That means that both of these > threads (185038 &185039) were tied up during this 5-minute window. Subsequent > incoming client connections to the HS2 interface will be in the accept queue > until one of these threads frees up. If more of those accepted connections > do not start sasl negotiation, the problem will snowball. > Attached a simple tcp client [^simple_tcp_client.py] that when called more > than once in quick succession, will block port 21050 and cause otherwise > legitimate connections from Impala-supported clients to wait in the "accept" > queue for at least 5 minutes. > {noformat} > # python simple_tcp_client.py & > [1] 19986 > 2022-10-12 11:28:16 INFO Created a tcp client > 2022-10-12 11:28:16 INFO Connecting to: c908086-2.vpc.cloudera.com:21050 > 2022-10-12 11:28:16 INFO Client1 connected > 2022-10-12 11:28:16 INFO Sleeping for 5 minutes > # python simple_tcp_client.py & > [2] 19989 > 2022-10-12 11:28:20 INFO Created a tcp client > 2022-10-12 11:28:20 INFO Connecting to: c908086-2.vpc.cloudera.com:21050 > 2022-10-12 11:28:20 INFO Client1 connected > 2022-10-12 11:28:20 INFO Sleeping for 5 minutes > {noformat} > {noformat} > # impala-shell -i c908086-2.vpc.cloudera.com -d default -k --protocol=hs2 > Starting Impala Shell using Kerberos authentication > Using service name 'impala' > Socket error None: timed out > *********************************************************************************** > Welcome to the Impala shell. > (Impala Shell v3.4.0-SNAPSHOT (a1dfdfd) built on Sun Aug 21 10:10:08 UTC 2022) > When pretty-printing is disabled, you can use the '--output_delimiter' flag > to set > the delimiter for fields in the same row. The default is '\t'. > *********************************************************************************** > [Not connected] > > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org