John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7061/2/be/src/transport/TSaslServerTransport.cpp
File be/src/transport/TSaslServerTransport.cpp:

PS2, Line 158: socket->setRecvTimeout(0);
             :     socket->setSendTimeout(0);
> Ah, that's what I was missing, thanks John. Still trying to make sure I und
Sorry, reposting this comment so it's not all on one line: If I'm reading the 
thrift code correctly, fixing the lock doesn't really help the specific issue 
I've seen. The C++ thrift server implementation ends up calling getTransport on 
the same thread that does the accept of the connection. So if the getTransport 
implementation does reads/writes that ends up blocking, it prevents new 
connections from being accepted until the read/write completes or times out. We 
ran into this issue when a client connecting to the hs2_port started the SASL 
handshake then for some reason quits communicating with the server (the packets 
from the client to impalad started to not make it to impalad even the RST 
packet that the client sends after retrying). In this case all other connection 
requests to the hs2_port end up being queued until I think the tcp keepalive 
kicks in (I think the default time is 2 hours) or impalad is restarted. (The 
Java implementation of TThreadPoolServer puts the accept!
 ed connection on a different thread before calling getTransport 
https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/server/TThreadPoolServer.java)
 If the C++ thrift implementation put the connection on a different thread 
before calling getTransport, fixing the locking would be helpful.

C++ implementation: 
https://github.com/apache/thrift/blob/0.10.0/lib/cpp/src/thrift/server/TServerFramework.cpp
 lines 147-157
 
Backtrace ends up lookings something like this (from IMPALA-5268):
#6  0x0000000001b394d2 in apache::thrift::transport::TSSLSocket::read(unsigned 
char*, unsigned int) ()
No symbol table info available.
#7  0x0000000001b36003 in unsigned int 
apache::thrift::transport::readAll<apache::thrift::transport::TSocket>(apache::thrift::transport::TSocket&,
 unsigned char*, unsigned int) ()
No symbol table info available.
#8  0x0000000000b52e15 in 
apache::thrift::transport::TSaslTransport::receiveSaslMessage(apache::thrift::transport::NegotiationStatus*,
 unsigned int*) ()
No symbol table info available.
#9  0x0000000000b50733 in 
apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage() ()
No symbol table info available.
#10 0x0000000000b52f9e in apache::thrift::transport::TSaslTransport::open() ()
No symbol table info available.
#11 0x0000000000b51431 in 
apache::thrift::transport::TSaslServerTransport::Factory::getTransport(boost::shared_ptr<apache::thrift::transport::TTransport>)
 ()
No symbol table info available.
#12 0x0000000001b4066d in apache::thrift::server::TThreadPoolServer::serve() ()
No symbol table info available.
#13 0x00000000009f2e60 in 
impala::ThriftServer::ThriftServerEventProcessor::Supervise() ()
No symbol table info available.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <j...@arcadiadata.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <j...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to