TZlibTransport for python, a zlib compressed transport
------------------------------------------------------

                 Key: THRIFT-1103
                 URL: https://issues.apache.org/jira/browse/THRIFT-1103
             Project: Thrift
          Issue Type: New Feature
          Components: Python - Library
            Reporter: Will Pierce
            Assignee: Will Pierce


New implementation of zlib compressed transport for python.

The attached patch provides a zlib compressed transport wrapper for python.  It 
is similar to the TFramedTransport, in that it wraps another transport, 
implementing the data compression as a transformation layer on top of the 
underlying transport that it wraps.

The compression level is configurable in the constructor, from 0 (none) to 9 
(best) and defaults to 9 for best compression.  The way this works is that 
every write() to the transport appends more data to the internal cStringIO 
write buffer.  When the transport's flush() method is called, the buffered 
bytes are then passed to a zlib Compressor object and flush()ed with 
zlib.Z_SYNC_FLUSH.

Because the thrift API calls the transport's flush() after writeMessageEnd(), 
this means very small thrift RPC calls don't get compressed well.  This 
transport works best on thrift protocols where the payload contains strings 
longer than 10 characters.  As with all data compression, the more redundancy 
in the uncompressed input, the greater the resulting compression.

The TZlibTransport class also implements some basic statistics that track the 
number of raw bytes written and read, versus the decompressed equivalent.  The 
getCompRatio() method returns a tuple of 
(readCompressionRatio,writeCompressionRatio) where ratio is computed using: 
compressed_bytes/uncompressed_bytes.  (So 10 compression is 0.10, meaning 
smaller numbers are better.)  The getCompSavings() method returns the actual 
number of (saved_read_bytes,saved_write_bytes) which might be negative when the 
compression of non-compressible data ends up expanding the data.  So hopefully, 
anyone who uses this transport will be able to tell whether the compression is 
saving bandwidth or not.

I will add the patch in a few minutes.

I haven't tested this against the C++ TZlibTransport, only against itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to