[jira] [Issue Comment Deleted] (KAFKA-595) Decouple producer side compression from server-side compression.

Jay Kreps (JIRA) Thu, 06 Dec 2012 08:35:45 -0800

     [ 
https://issues.apache.org/jira/browse/KAFKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jay Kreps updated KAFKA-595:
----------------------------

    Comment: was deleted

(was: I think saying it is unnecessary is perhaps overstating it. It depends 
what you are trying to optimize. Compression trades client CPU for network 
bandwidth. For our own use case I don't know whether or not that use case is 
worth it or not. It depends on the CPU usage of compression, the compression 
ratio, and the relative availability of network bandwidth. The CPU usage isn't 
necessarily fixed--a cheaper compression algorithm than GZIP, plus a little 
work on the compression code to avoid recopies and deep iteration could 
significantly reduce the CPU cost on the broker.

I would instead rephrase this as a feature request--"Decouple producer 
compression from broker compression.". Since we are going to recompress anyway 
this is super easy to implement. Basically right now we have a kind of odd 
heuristic which says "if there is at least one compressed message in a given 
message set, recompress the entire message set using the last compression codec 
that appears in the message set". This is actually a little odd.

I would recommend we instead add a log.compression.codec property (plus 
override map) that controls the compression on the broker. This could be set 
the same as the producer or not. I don't think we necessarily need to support 
the current behavior of retaining whatever the producer uses--this behavior is 
actually kind of bad since it means consumers must support EVERY codec ANY 
producer happens to send. The broker would always apply the configured 
compression codec to incoming messages regardless of source compression format.)
    
> Decouple producer side compression from server-side compression.
> ----------------------------------------------------------------
>
>                 Key: KAFKA-595
>                 URL: https://issues.apache.org/jira/browse/KAFKA-595
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>              Labels: feature
>
> In 0.7 Kafka always appended messages to the log using whatever compression 
> codec the client used. In 0.8, after the KAFKA-506 patch, the master always 
> recompresses the message before appending to the log to assign ids. Currently 
> the server uses a funky heuristic to choose a compression codec based on the 
> codecs the producer used. This doesn't actually make that much sense. It 
> would be better for the server to have its own compression (a global default 
> and per-topic override) that specified the compression codec, and have the 
> server always recompress with this codec regardless of the original codec.
> Compression currently happens in kafka.log.Log.assignOffsets (perhaps should 
> be renamed if it takes on compression as an official responsibility instead 
> of a side-effect).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Deleted] (KAFKA-595) Decouple producer side compression from server-side compression.

Reply via email to