[
https://issues.apache.org/jira/browse/THRIFT-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17914901#comment-17914901
]
Csaba Ringhofer commented on THRIFT-5846:
-----------------------------------------
>It would also make sense to skip flushing in TBufferedTransport::close() if
>the underlying transport is not open.
Thought a bit more about this - actually I think that
TBufferedTransport::close() should not call flush at all.
Checked TBufferedTransport for other languages and close() simply calls the
close() of the underlying transport:
c#:
https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/netstd/Thrift/Transport/Layered/TBufferedTransport.cs#L79
go:
https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/go/thrift/buffered_transport.go#L63
python:
https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/py/src/transport/TTransport.py#L158
I would expect close() to be callable on a transport after error, but not
flush(). TConnectedClient calls close() even after error, similarly to Impala's
TAcceptQueueServer:
https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/server/TConnectedClient.cpp#L100
Calling flush() is necessary when requesting/responding RPCs, and TProcessor
implementations contain these flush() calls, so the only case when there should
be unflushed data during cleanup is when there was an error/the connection was
closed by the client. In these cases the client shouldn't expect more data and
flush can only do harm.
> Hang when closing TBufferedTransport + TSslTransport
> ----------------------------------------------------
>
> Key: THRIFT-5846
> URL: https://issues.apache.org/jira/browse/THRIFT-5846
> Project: Thrift
> Issue Type: Bug
> Components: C++ - Library
> Reporter: Csaba Ringhofer
> Priority: Major
>
> The issue occurred in Impala tests when switching to OpenSsl 3.2
> (IMPALA-13680).
> While the root cause of the regression may be some change in OpenSsl,
> Thrift's behavior seems also strange as it tries to do SSL handshake when
> closing the transport:
> {code}
> apache::thrift::transport::TSSLSocket::waitForEvent(bool) [TSSLSocket.cpp :
> 881 + 0xa]
> apache::thrift::transport::TSSLSocket::initializeHandshake() [TSSLSocket.cpp
> : 683 + 0x12]
> apache::thrift::transport::TSSLSocket::flush() [TSSLSocket.cpp : 613 + 0x5]
> apache::thrift::transport::TBufferedTransport::close() [TBufferTransports.cpp
> : 133 + 0x3]
> apache::thrift::server::TAcceptQueueServer::Task::run()
> [TAcceptQueueServer.cpp : 111 + 0x3]
> {code}
> Note that Impala uses Thrift 0.16.0 with a few patches so line numbers may
> not match with the original Thrift source code, but the Impala changes do not
> seem relevant in this case.
> The timeline of the events is:
> 1. TSSLSocket wrapped in TBufferedTransport is created and opened
> 2. the first TSSLSocket::initializeHandshake() fails and leads to throwing an
> TTransportException with "SSL_accept: wrong version number (SSL_error_code =
> 1)"
> 3. Impala tries to close the transport
> 4. as the transport is TBufferedTransport, it calls flush on close()
> https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/transport/TBufferTransports.h#L236
> 5. TBufferedTransport::flush() calls the underlying transport's flush()
> https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/transport/TBufferTransports.cpp#L134
> 6. TSSlSocket::flush() calls initializeHandshake()
> https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/transport/TSSLSocket.cpp#L586
> The hang only happens with the "wrong version number" error, other SSL_accept
> errors were handled as before (e.g. "unsupported protocol").
> My expectation is that if the transport's open() was successful then calling
> close() should be safe, even if a later function throws an exception. In case
> of TSSLSocket open() just opens the socket and the first read()/write() will
> do the actual SSL handshake. If the handshake fails then the TSSLSocket
> enters a strange state where isOpen() returns true but the ssl_ object is not
> usable.
> I think that a possible solution is to shutdown+free ssl_ if the handshake
> failed:
> https://github.com/apache/thrift/blob/7734c393ed0f0632c658c05e33a4d6592cf2912c/lib/cpp/src/thrift/transport/TSSLSocket.cpp#L698
> This means that isOpen() would return false after the error +
> TSSLSocket::flush() would not try to reestablish connection. It would also
> make sense to skip flushing in TBufferedTransport::close() if the underlying
> transport is not open.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)