Well then I'm not sure if I follow the re-compression line of reasoning
behind
the double wrapping.. but I'm only starting to look into details of Kafka so
maybe there's another layer of compression that lumps messages together
for saving i/o between log or within cluster.. anyways as for the double
wrapping
would be really good if it was documented where the wire format is
described
as a note or something like that.

On Thu, Jul 19, 2012 at 7:04 PM, Jay Kreps <jay.kr...@gmail.com> wrote:

> Currently we always retain the data in the format that it was produced. It
> would be a fairly small change, though, to allow a server-override
> parameter that would have the server either always compress the data or
> always decompress it before writing it to the log.
>
> -Jay
>
> On Thu, Jul 19, 2012 at 10:24 AM, Michal Hariš <michal.har...@gmail.com
> >wrote:
>
> > Does it mean that currently if a producer publishes an uncompressed
> message
> > to the server which has local log format configured to compressed,
> > consumers will receive compressed messages when fetching?
> >  On Jul 19, 2012 5:08 PM, "Jay Kreps (JIRA)" <j...@apache.org> wrote:
> >
> > >
> > >     [
> > >
> >
> https://issues.apache.org/jira/browse/KAFKA-406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418395#comment-13418395
> > ]
> > >
> > > Jay Kreps commented on KAFKA-406:
> > > ---------------------------------
> > >
> > > Oh yes, and the other design requirement we had was that messages not
> be
> > > re-compressed on a fetch request. A simple implementation that didn't
> > have
> > > this requirement would just be to have the consumer request N messages,
> > and
> > > either specify to compress or not, and have the server read these into
> > > memory, decompress if its local log format is comrpessed, and then
> batch
> > > compress exactly the messages the client asked for, and send just that.
> > The
> > > problem with this is that we have about a 5x read-to-write ratio so
> > > recompressing on each read is now recompressing the same stuff 5 times
> on
> > > average. This makes consumption way more expensive. I don't think this
> > is a
> > > hard requirement but to make that approach fly we would have to
> > demonstrate
> > > that the cpu overhead of compression would not become a serious
> > bottleneck.
> > > I know this won't work with GZIP, but it might be possible to do it
> with
> > > snappy or a faster compression algo.
> > >
> > > > Gzipped payload is a fully wrapped Message (with headers), not just
> > > payload
> > > >
> > >
> >
> ---------------------------------------------------------------------------
> > > >
> > > >                 Key: KAFKA-406
> > > >                 URL: https://issues.apache.org/jira/browse/KAFKA-406
> > > >             Project: Kafka
> > > >          Issue Type: Bug
> > > >          Components: core
> > > >    Affects Versions: 0.7.1
> > > >         Environment: N/A
> > > >            Reporter: Lorenzo Alberton
> > > >
> > > > When creating a gzipped MessageSet, the collection of Messages is
> > passed
> > > to CompressionUtils.compress(), where each message is serialised [1]
> > into a
> > > buffer (not just the payload, the full Message with headers,
> > uncompressed),
> > > then gripped, and finally wrapped into another Message [2].
> > > > In other words, the consumer has to unwrap the Message flagged as
> > > gzipped, unzip the payload, and unwrap the unzipped payload again as a
> > > non-compressed Message.
> > > > Is this double-wrapping the intended behaviour?
> > > > [1] messages.foreach(m => m.serializeTo(messageByteBuffer))
> > > > [2] new Message(outputStream.toByteArray, compressionCodec)
> > >
> > > --
> > > This message is automatically generated by JIRA.
> > > If you think it was sent incorrectly, please contact your JIRA
> > > administrators:
> > >
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > > For more information on JIRA, see:
> > http://www.atlassian.com/software/jira
> > >
> > >
> > >
> >
>

Reply via email to