[ 
https://issues.apache.org/jira/browse/THRIFT-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17647623#comment-17647623
 ] 

Benjamin Mahler commented on THRIFT-5673:
-----------------------------------------

It gets called because the upper layer asks to read some data (in this case it 
wants to read the start of the json encoded thrift message which is a '[' 
character). In the case that the zlib layer finds EOF from the underlying 
transport, it should not be looping infinitely like this burning a cpu core. 
Let's compare with the C++ and Java implementations:

C++: the same layer appears to surface up the EOF (0 bytes read) instead of 
looping: 
https://github.com/apache/thrift/blob/v0.17.0/lib/cpp/src/thrift/transport/TZlibTransport.cpp#L178-L182

Java: it appears to throw an IOException that wraps a TTransportException that 
I think will have an END_OF_FILE type:
https://github.com/apache/thrift/blob/v0.17.0/lib/java/src/main/java/org/apache/thrift/transport/TZlibTransport.java#L99-L118
https://github.com/apache/thrift/blob/v0.17.0/lib/java/src/main/java/org/apache/thrift/transport/TTransportException.java#L33

So it appears that the C++/Java equivalents here surface the unexpected EOF up 
to the caller rather than loop infinitely, at least based on a read through the 
code.

> TZlibTransport.py gets into infinite loop if underlying transport ends 
> unexpectedly.
> ------------------------------------------------------------------------------------
>
>                 Key: THRIFT-5673
>                 URL: https://issues.apache.org/jira/browse/THRIFT-5673
>             Project: Thrift
>          Issue Type: Bug
>          Components: Python - Library
>            Reporter: Benjamin Mahler
>            Priority: Major
>
> In python, when using a thrift client composed as follows, the 
> TZlibTransport.py logic can enter an infinite loop when the response closes 
> prematurely (e.g. disconnection):
> {code:java}
> class GzipTransport(TZlibTransport.TZlibTransport):
>   def _init_zlib(self):
>     """Override of internal method for setting up the zlib compression and
>     decompression objects, to support 'gzip' content encoding.
>     """
>     self._zcomp_read = zlib.decompressobj(wbits=zlib.MAX_WBITS + 16)
>     self._zcomp_write = zlib.compressobj(self.compresslevel, 
> wbits=zlib.MAX_WBITS + 16)
> http_client.setCustomHeaders({"Content-Encoding": "gzip", "Accept-Encoding": 
> "gzip"})
> transport = GzipTransport(http_client)
> transport = TTransport.TBufferedTransport(transport, rbuf_size=1024 * 1024)
> thriftClient = ReadOnlyScheduler.Client(protocol)
> httpClient.open()
> try:
>   transport.open()
>   return thriftClient.getFoo(FooQuery(...))
> finally:
>   transport.close()
> {code}
> We observed that when the server shuts down at the same time as the response 
> is being read, the program spins on the following stack trace:
> {code:java}
>   File: "client.py", line 47, in query_scheduler_api
>     return thrift_client_method(thrift_client, *args)
>   File: "gen/apache/aurora/api/ReadOnlyScheduler.py", line 198, in 
> getJobSummary
>     return self.recv_getJobSummary()
>   File: "gen/apache/aurora/api/ReadOnlyScheduler.py", line 210, in 
> recv_getJobSummary
>     (fname, mtype, rseqid) = iprot.readMessageBegin()
>   File: "thrift/protocol/TJSONProtocol.py", line 417, in readMessageBegin
>     self.readJSONArrayStart()
>   File: "thrift/protocol/TJSONProtocol.py", line 405, in readJSONArrayStart
>     self.readJSONSyntaxChar(LBRACKET)
>   File: "thrift/protocol/TJSONProtocol.py", line 253, in readJSONSyntaxChar
>     current = self.reader.read()
>   File: 
> "thrift-0.13.0-cp37-cp37m-linux_x86_64.whl/thrift/protocol/TJSONProtocol.py", 
> line 164, in read
>     self.data = self.protocol.trans.read(1)
>   File: "thrift/transport/TTransport.py", line 164, in read
>     self.__rbuf = BufferIO(self.__trans.read(max(sz, self.__rbuf_size)))
>   File: "thrift/transport/TZlibTransport.py", line 191, in read
>     if self.readComp(sz):
>   File: "thrift/transport/TZlibTransport.py", line 208, in readComp
>     return False
> {code}
> Essentially, it appears that when the json protocol asks to read 1 byte, and 
> there's always 0 bytes returned from the http response (EOF), then the zlib 
> transport will spin in a loop here: 
> [https://github.com/apache/thrift/blob/v0.17.0/lib/py/src/transport/TZlibTransport.py#L190-L192]
> It seems like this is a bug, and the zlib transport should treat 0 bytes 
> coming back from the underlying transport as EOF, at which point it closes 
> its compression streams and tells the caller about the remaining data and EOF 
> (possibly throwing an exception if the EOF does not cleanly finish the 
> decompression stream).
> In our case, the underlying transport is ultimately the HttpClient's 
> response, and that doesn't appear to throw an exception when the response is 
> cut short, it just appears to return 0 bytes repeatedly.
> We use thrift 0.13.0 but I see the same logic in the latest release and the 
> bug appears unfixed there as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to