Currently we always retain the data in the format that it was produced. It would be a fairly small change, though, to allow a server-override parameter that would have the server either always compress the data or always decompress it before writing it to the log.
-Jay On Thu, Jul 19, 2012 at 10:24 AM, Michal Hariš <michal.har...@gmail.com>wrote: > Does it mean that currently if a producer publishes an uncompressed message > to the server which has local log format configured to compressed, > consumers will receive compressed messages when fetching? > On Jul 19, 2012 5:08 PM, "Jay Kreps (JIRA)" <j...@apache.org> wrote: > > > > > [ > > > https://issues.apache.org/jira/browse/KAFKA-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418395#comment-13418395 > ] > > > > Jay Kreps commented on KAFKA-406: > > --------------------------------- > > > > Oh yes, and the other design requirement we had was that messages not be > > re-compressed on a fetch request. A simple implementation that didn't > have > > this requirement would just be to have the consumer request N messages, > and > > either specify to compress or not, and have the server read these into > > memory, decompress if its local log format is comrpessed, and then batch > > compress exactly the messages the client asked for, and send just that. > The > > problem with this is that we have about a 5x read-to-write ratio so > > recompressing on each read is now recompressing the same stuff 5 times on > > average. This makes consumption way more expensive. I don't think this > is a > > hard requirement but to make that approach fly we would have to > demonstrate > > that the cpu overhead of compression would not become a serious > bottleneck. > > I know this won't work with GZIP, but it might be possible to do it with > > snappy or a faster compression algo. > > > > > Gzipped payload is a fully wrapped Message (with headers), not just > > payload > > > > > > --------------------------------------------------------------------------- > > > > > > Key: KAFKA-406 > > > URL: https://issues.apache.org/jira/browse/KAFKA-406 > > > Project: Kafka > > > Issue Type: Bug > > > Components: core > > > Affects Versions: 0.7.1 > > > Environment: N/A > > > Reporter: Lorenzo Alberton > > > > > > When creating a gzipped MessageSet, the collection of Messages is > passed > > to CompressionUtils.compress(), where each message is serialised [1] > into a > > buffer (not just the payload, the full Message with headers, > uncompressed), > > then gripped, and finally wrapped into another Message [2]. > > > In other words, the consumer has to unwrap the Message flagged as > > gzipped, unzip the payload, and unwrap the unzipped payload again as a > > non-compressed Message. > > > Is this double-wrapping the intended behaviour? > > > [1] messages.foreach(m => m.serializeTo(messageByteBuffer)) > > > [2] new Message(outputStream.toByteArray, compressionCodec) > > > > -- > > This message is automatically generated by JIRA. > > If you think it was sent incorrectly, please contact your JIRA > > administrators: > > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > > For more information on JIRA, see: > http://www.atlassian.com/software/jira > > > > > > >