Currently we always retain the data in the format that it was produced. It
would be a fairly small change, though, to allow a server-override
parameter that would have the server either always compress the data or
always decompress it before writing it to the log.

-Jay

On Thu, Jul 19, 2012 at 10:24 AM, Michal Hariš <michal.har...@gmail.com>wrote:

> Does it mean that currently if a producer publishes an uncompressed message
> to the server which has local log format configured to compressed,
> consumers will receive compressed messages when fetching?
>  On Jul 19, 2012 5:08 PM, "Jay Kreps (JIRA)" <j...@apache.org> wrote:
>
> >
> >     [
> >
> https://issues.apache.org/jira/browse/KAFKA-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418395#comment-13418395
> ]
> >
> > Jay Kreps commented on KAFKA-406:
> > ---------------------------------
> >
> > Oh yes, and the other design requirement we had was that messages not be
> > re-compressed on a fetch request. A simple implementation that didn't
> have
> > this requirement would just be to have the consumer request N messages,
> and
> > either specify to compress or not, and have the server read these into
> > memory, decompress if its local log format is comrpessed, and then batch
> > compress exactly the messages the client asked for, and send just that.
> The
> > problem with this is that we have about a 5x read-to-write ratio so
> > recompressing on each read is now recompressing the same stuff 5 times on
> > average. This makes consumption way more expensive. I don't think this
> is a
> > hard requirement but to make that approach fly we would have to
> demonstrate
> > that the cpu overhead of compression would not become a serious
> bottleneck.
> > I know this won't work with GZIP, but it might be possible to do it with
> > snappy or a faster compression algo.
> >
> > > Gzipped payload is a fully wrapped Message (with headers), not just
> > payload
> > >
> >
> ---------------------------------------------------------------------------
> > >
> > >                 Key: KAFKA-406
> > >                 URL: https://issues.apache.org/jira/browse/KAFKA-406
> > >             Project: Kafka
> > >          Issue Type: Bug
> > >          Components: core
> > >    Affects Versions: 0.7.1
> > >         Environment: N/A
> > >            Reporter: Lorenzo Alberton
> > >
> > > When creating a gzipped MessageSet, the collection of Messages is
> passed
> > to CompressionUtils.compress(), where each message is serialised [1]
> into a
> > buffer (not just the payload, the full Message with headers,
> uncompressed),
> > then gripped, and finally wrapped into another Message [2].
> > > In other words, the consumer has to unwrap the Message flagged as
> > gzipped, unzip the payload, and unwrap the unzipped payload again as a
> > non-compressed Message.
> > > Is this double-wrapping the intended behaviour?
> > > [1] messages.foreach(m => m.serializeTo(messageByteBuffer))
> > > [2] new Message(outputStream.toByteArray, compressionCodec)
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> > administrators:
> > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
> >
>

Reply via email to