[
https://issues.apache.org/jira/browse/KAFKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jay Kreps updated KAFKA-595:
----------------------------
Comment: was deleted
(was: I think saying it is unnecessary is perhaps overstating it. It depends
what you are trying to optimize. Compression trades client CPU for network
bandwidth. For our own use case I don't know whether or not that use case is
worth it or not. It depends on the CPU usage of compression, the compression
ratio, and the relative availability of network bandwidth. The CPU usage isn't
necessarily fixed--a cheaper compression algorithm than GZIP, plus a little
work on the compression code to avoid recopies and deep iteration could
significantly reduce the CPU cost on the broker.
I would instead rephrase this as a feature request--"Decouple producer
compression from broker compression.". Since we are going to recompress anyway
this is super easy to implement. Basically right now we have a kind of odd
heuristic which says "if there is at least one compressed message in a given
message set, recompress the entire message set using the last compression codec
that appears in the message set". This is actually a little odd.
I would recommend we instead add a log.compression.codec property (plus
override map) that controls the compression on the broker. This could be set
the same as the producer or not. I don't think we necessarily need to support
the current behavior of retaining whatever the producer uses--this behavior is
actually kind of bad since it means consumers must support EVERY codec ANY
producer happens to send. The broker would always apply the configured
compression codec to incoming messages regardless of source compression format.)
> Decouple producer side compression from server-side compression.
> ----------------------------------------------------------------
>
> Key: KAFKA-595
> URL: https://issues.apache.org/jira/browse/KAFKA-595
> Project: Kafka
> Issue Type: Improvement
> Affects Versions: 0.8
> Reporter: Neha Narkhede
> Labels: feature
>
> In 0.7 Kafka always appended messages to the log using whatever compression
> codec the client used. In 0.8, after the KAFKA-506 patch, the master always
> recompresses the message before appending to the log to assign ids. Currently
> the server uses a funky heuristic to choose a compression codec based on the
> codecs the producer used. This doesn't actually make that much sense. It
> would be better for the server to have its own compression (a global default
> and per-topic override) that specified the compression codec, and have the
> server always recompress with this codec regardless of the original codec.
> Compression currently happens in kafka.log.Log.assignOffsets (perhaps should
> be renamed if it takes on compression as an official responsibility instead
> of a side-effect).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira