[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284004#comment-15284004
 ] 

Jun Rao commented on KAFKA-1981:
--------------------------------

[~ewasserman], thanks for the KIP. This seems like a useful feature. Another 
use case for this is to handle application mistakes. For example, a compacted 
topic could be the source of truth for certain type of data. If there is an 
application error, some incorrect data can be published to the topic. When 
compaction is triggered, those incorrect data can wipe out the last correct 
message associated with a key. Being able to delay the compaction based on a 
configurable amount of time will allow a user to preserve those last known 
correct messages after an application error is discovered. Could you include 
this use case in the KIP? Do you want to start the discussion of this KIP in 
the mailing list?

> Make log compaction point configurable
> --------------------------------------
>
>                 Key: KAFKA-1981
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1981
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.8.2.0
>            Reporter: Jay Kreps
>              Labels: newbie++
>         Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to