Jay Kreps created KAFKA-1981:
--------------------------------
Summary: Make log compaction point configurable
Key: KAFKA-1981
URL: https://issues.apache.org/jira/browse/KAFKA-1981
Project: Kafka
Issue Type: Improvement
Affects Versions: 0.8.2.0
Reporter: Jay Kreps
Currently if you enable log compaction the compactor will kick in whenever you
hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. Other
than this we don't give you fine-grained control over when compaction occurs.
In addition we never compact the active segment (since it is still being
written to).
Other than this we don't really give you much control over when compaction will
happen. The result is that you can't really guarantee that a consumer will get
every update to a compacted topic--if the consumer falls behind a bit it might
just get the compacted version.
This is usually fine, but it would be nice to make this more configurable so
you could set either a # messages, size, or time bound for compaction.
This would let you say, for example, "any consumer that is no more than 1 hour
behind will get every message."
This should be relatively easy to implement since it just impacts the end-point
the compactor considers available for compaction. I think we already have that
concept, so this would just be some other overrides to add in when calculating
that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)