[
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219089#comment-15219089
]
ASF GitHub Bot commented on KAFKA-1981:
---------------------------------------
GitHub user ewasserman opened a pull request:
https://github.com/apache/kafka/pull/1168
KAFKA-1981 Make log compaction point configurable
@jkreps
Implements control over the portion of the head of the log that will not be
compacted (i.e. preserved in detail).
The log cleaner can be configured retain a minimum amount of the
uncompacted "head" of the log.
This is enabled by setting one or more of the compaction lags:
log.cleaner.min.compaction.lag.ms
log.cleaner.min.compaction.lag.bytes
log.cleaner.min.compaction.lag.messages
with similar per topic configurations:
min.compaction.lag.ms
min.compaction.lag.bytes
min.compaction.lag.messages
These can be used to set constraints on the minimum message age, aggregate
size, and/or count respectively that may be compacted. If none are set, all log
segments are eligible for compaction except for the last segment, i.e. the one
currently being written to. The active segment will not be compacted even if
all of the compaction lag constraints are satisfied.
In particular this allows for the example use case described in the JIRA:
"any consumer that is no more than 1 hour behind will get every message."
This contribution is my (Eric Wasserman's) original work and I license the
work to the Kafka project under its open source license.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ewasserman/kafka feat-compaction-lag
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/kafka/pull/1168.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1168
----
commit d9fa5ca1c6b2f9e08689697b85b9b54a90afa67c
Author: Eric Wasserman <[email protected]>
Date: 2016-03-15T00:57:07Z
log
commit b9d2752b6db36a7d9464e8b4c12ec0d9cd8dfcf7
Author: Eric Wasserman <[email protected]>
Date: 2016-03-16T03:46:07Z
add compaction lag
commit dbb57bcf4c1d0fdf5c15c7f09838a6f47f64bd29
Author: Eric Wasserman <[email protected]>
Date: 2016-03-17T00:56:57Z
tests
commit ffa37a2d296bceb612152316ac163d67e1d71fe0
Author: Eric Wasserman <[email protected]>
Date: 2016-03-17T20:04:25Z
clean up test
commit f0536ae2612338460f07377afb388b64a84c7972
Author: Eric Wasserman <[email protected]>
Date: 2016-03-18T00:54:13Z
integration test with lag
commit 513a43296b99df2d0426abe9d09f942a95e02274
Author: Eric Wasserman <[email protected]>
Date: 2016-03-22T00:56:02Z
lag integration test
commit 7746924e1fb5d964fe609e941fec5611828e8720
Author: Eric Wasserman <[email protected]>
Date: 2016-03-23T17:22:08Z
add size and message count properties; update property names
commit 8ae2a1daf0cd970bc38d8654f73f9175ee93433e
Author: Eric Wasserman <[email protected]>
Date: 2016-03-23T23:36:39Z
final tests
commit 7b41064eca6a45ad1f4a74c2ad0e3d6cec8f2c25
Author: Eric Wasserman <[email protected]>
Date: 2016-03-23T23:37:33Z
generalized compaction lag
commit b5c247be267918f3c4f1163d81e95fb93cd6dc01
Author: Eric Wasserman <[email protected]>
Date: 2016-03-24T00:45:32Z
update documentation
commit 118fa196f41a8fa0198c672288433cfce30a322c
Author: Eric Wasserman <[email protected]>
Date: 2016-03-24T00:48:35Z
add missing license comment
commit fa0b96e984a327968824894859ae9c97e4d0ed0d
Author: Eric Wasserman <[email protected]>
Date: 2016-03-24T16:44:39Z
reverse log4j
commit c4c0d97cb02104e8aea73df6bc3305914768192a
Author: Eric Wasserman <[email protected]>
Date: 2016-03-16T03:46:07Z
add compaction lag
commit 45a02f90aeb378ccd51e25dac4b19a1bfabfd79b
Author: Eric Wasserman <[email protected]>
Date: 2016-03-24T23:09:46Z
Merge branch 'feat-compaction-lag' of github.com:ewasserman/kafka into
feat-compaction-lag
commit 49051027872443d2a38141893dab2d6d68b7c310
Author: Eric Wasserman <[email protected]>
Date: 2016-03-30T23:10:33Z
Merge branch 'trunk' into feat-compaction-lag
commit c56ccd517ad3d414c4aee891975693cffc794359
Author: Eric Wasserman <[email protected]>
Date: 2016-03-30T23:37:00Z
adapt to changes in test utils.
----
> Make log compaction point configurable
> --------------------------------------
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
> Issue Type: Improvement
> Affects Versions: 0.8.2.0
> Reporter: Jay Kreps
> Labels: newbie++
>
> Currently if you enable log compaction the compactor will kick in whenever
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted.
> Other than this we don't give you fine-grained control over when compaction
> occurs. In addition we never compact the active segment (since it is still
> being written to).
> Other than this we don't really give you much control over when compaction
> will happen. The result is that you can't really guarantee that a consumer
> will get every update to a compacted topic--if the consumer falls behind a
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the
> end-point the compactor considers available for compaction. I think we
> already have that concept, so this would just be some other overrides to add
> in when calculating that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)