John Sherman has posted comments on this change. Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport ......................................................................
Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7061/2/be/src/transport/TSaslServerTransport.cpp File be/src/transport/TSaslServerTransport.cpp: PS2, Line 158: socket->setRecvTimeout(0); : socket->setSendTimeout(0); > Ah, that's what I was missing, thanks John. Still trying to make sure I und Sorry, reposting this comment so it's not all on one line: If I'm reading the thrift code correctly, fixing the lock doesn't really help the specific issue I've seen. The C++ thrift server implementation ends up calling getTransport on the same thread that does the accept of the connection. So if the getTransport implementation does reads/writes that ends up blocking, it prevents new connections from being accepted until the read/write completes or times out. We ran into this issue when a client connecting to the hs2_port started the SASL handshake then for some reason quits communicating with the server (the packets from the client to impalad started to not make it to impalad even the RST packet that the client sends after retrying). In this case all other connection requests to the hs2_port end up being queued until I think the tcp keepalive kicks in (I think the default time is 2 hours) or impalad is restarted. (The Java implementation of TThreadPoolServer puts the accept! ed connection on a different thread before calling getTransport https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/server/TThreadPoolServer.java) If the C++ thrift implementation put the connection on a different thread before calling getTransport, fixing the locking would be helpful. C++ implementation: https://github.com/apache/thrift/blob/0.10.0/lib/cpp/src/thrift/server/TServerFramework.cpp lines 147-157 Backtrace ends up lookings something like this (from IMPALA-5268): #6 0x0000000001b394d2 in apache::thrift::transport::TSSLSocket::read(unsigned char*, unsigned int) () No symbol table info available. #7 0x0000000001b36003 in unsigned int apache::thrift::transport::readAll<apache::thrift::transport::TSocket>(apache::thrift::transport::TSocket&, unsigned char*, unsigned int) () No symbol table info available. #8 0x0000000000b52e15 in apache::thrift::transport::TSaslTransport::receiveSaslMessage(apache::thrift::transport::NegotiationStatus*, unsigned int*) () No symbol table info available. #9 0x0000000000b50733 in apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage() () No symbol table info available. #10 0x0000000000b52f9e in apache::thrift::transport::TSaslTransport::open() () No symbol table info available. #11 0x0000000000b51431 in apache::thrift::transport::TSaslServerTransport::Factory::getTransport(boost::shared_ptr<apache::thrift::transport::TTransport>) () No symbol table info available. #12 0x0000000001b4066d in apache::thrift::server::TThreadPoolServer::serve() () No symbol table info available. #13 0x00000000009f2e60 in impala::ThriftServer::ThriftServerEventProcessor::Supervise() () No symbol table info available. -- To view, visit http://gerrit.cloudera.org:8080/7061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Sherman <j...@arcadiadata.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: John Sherman <j...@arcadiadata.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-HasComments: Yes