[ https://issues.apache.org/jira/browse/THRIFT-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Will Pierce updated THRIFT-1103: -------------------------------- Attachment: THRIFT-1103.tzlibtransport_for_python_v1.patch Patch attached. Adds TZlibTransport.py into ./lib/py/src/transport/ and adds TZlibTransport into the transport/__init__.py module's __all__ list. I tested this on python 2.4 and 2.7. The zlib module is present and provides the same API in python 2.4 as 2.7 for our needs. If the patch for THRIFT-1094 is good and can be commited, then it would make it easier for me to extend the RunClientServer.py/TestServer.py/TestClient.py code to include testing that exercises the TZlibTransport code. (I did it locally in my copy of thrift-svn/trunk to test this code, but didn't want to submit a patch that requires another patch ( THRIFT-1094 ) which hasn't been approved yet.) > TZlibTransport for python, a zlib compressed transport > ------------------------------------------------------ > > Key: THRIFT-1103 > URL: https://issues.apache.org/jira/browse/THRIFT-1103 > Project: Thrift > Issue Type: New Feature > Components: Python - Library > Reporter: Will Pierce > Assignee: Will Pierce > Attachments: THRIFT-1103.tzlibtransport_for_python_v1.patch > > > New implementation of zlib compressed transport for python. > The attached patch provides a zlib compressed transport wrapper for python. > It is similar to the TFramedTransport, in that it wraps another transport, > implementing the data compression as a transformation layer on top of the > underlying transport that it wraps. > The compression level is configurable in the constructor, from 0 (none) to 9 > (best) and defaults to 9 for best compression. The way this works is that > every write() to the transport appends more data to the internal cStringIO > write buffer. When the transport's flush() method is called, the buffered > bytes are then passed to a zlib Compressor object and flush()ed with > zlib.Z_SYNC_FLUSH. > Because the thrift API calls the transport's flush() after writeMessageEnd(), > this means very small thrift RPC calls don't get compressed well. This > transport works best on thrift protocols where the payload contains strings > longer than 10 characters. As with all data compression, the more redundancy > in the uncompressed input, the greater the resulting compression. > The TZlibTransport class also implements some basic statistics that track the > number of raw bytes written and read, versus the decompressed equivalent. > The getCompRatio() method returns a tuple of > (readCompressionRatio,writeCompressionRatio) where ratio is computed using: > compressed_bytes/uncompressed_bytes. (So 10 compression is 0.10, meaning > smaller numbers are better.) The getCompSavings() method returns the actual > number of (saved_read_bytes,saved_write_bytes) which might be negative when > the compression of non-compressible data ends up expanding the data. So > hopefully, anyone who uses this transport will be able to tell whether the > compression is saving bandwidth or not. > I will add the patch in a few minutes. > I haven't tested this against the C++ TZlibTransport, only against itself. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira