[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-08-30 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15450387#comment-15450387
 ] 

Eric Wasserman commented on KAFKA-1981:
---

Opened new pull request at:
https://github.com/apache/kafka/pull/1794

@junrao could you please take a look at this new PR (I closed the prior one). I 
switched to using the time index to determine the last message time in the 
LogSegments as we discussed.

> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-06-30 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357222#comment-15357222
 ] 

Eric Wasserman commented on KAFKA-1981:
---

[~junrao] could you please take a look at the pull request 
https://github.com/apache/kafka/pull/1494


> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-06-22 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345618#comment-15345618
 ] 

Eric Wasserman commented on KAFKA-1981:
---

[~ijuma] do you think you have a chance to check out the new PR any time soon?

> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-06-10 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325578#comment-15325578
 ] 

Eric Wasserman commented on KAFKA-1981:
---

[~ijuma] A new pull request is available 
(https://github.com/apache/kafka/pull/1494) could you please review it. Thanks, 
Eric.

> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1981) Make log compaction point configurable

2016-06-06 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317588#comment-15317588
 ] 

Eric Wasserman edited comment on KAFKA-1981 at 6/7/16 12:56 AM:


During the KIP-58 vote it was 
[suggested](http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3ccabtagwebxsrveok-unuptjtsdf+d+pq8fuaahql+u9bgaz3...@mail.gmail.com%3e)
 the name of the sole remaining property be changed from:

log.cleaner.min.compaction.lag.ms

to

log.cleaner.compaction.delay.ms

The feature makes a guarantee that the elapsed time between adding a message 
and its being subject to compaction is _at minimum_ _*x*_ number of 
milliseconds. This setting is specifying _*x*_.

In particular this guarantee does not really affect _when_ a compaction will or 
will not happen. It only controls which messages will be protected from 
compaction in the event one occurs.

New Oxford American Dictionary defines:

*Lag* n. (also time lag) a period of time between one event or phenomenon and 
another: there was a time lag between the commission of the crime and its 
reporting to the police.

*Delay* n. a period of time by which something is late or postponed: a two-hour 
delay | long delays in obtaining passports.

Seems to me "lag" is closer than "delay" to the meaning we are after.

When considering alternative phrasing we may want to consider that the other 
parameters (cumulative message size, or message count) may later be added back 
into this feature.




was (Author: ewasserman):
During the KIP-58 vote it was 
[suggested](http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3ccabtagwebxsrveok-unuptjtsdf+d+pq8fuaahql+u9bgaz3...@mail.gmail.com%3e)
 the name of the sole remaining property be changed from:

log.cleaner.min.compaction.lag.ms

to

log.cleaner.compaction.delay.ms

The feature makes a guarantee that the elapsed time between adding a message 
and its being subject to compaction is _at minimum_ _*x*_ number of 
milliseconds. This setting is specifying _*x*_.

In particular this guarantee does not really affect *when* a compaction will or 
will not happen. It only controls which messages will be protected from 
compaction in the event one occurs.

New Oxford American Dictionary defines:

**Lag** n. (also time lag) a period of time between one event or phenomenon and 
another: there was a time lag between the commission of the crime and its 
reporting to the police.

**Delay** n. a period of time by which something is late or postponed: a 
two-hour delay | long delays in obtaining passports.

Seems to me "lag" is closer than "delay" to the meaning we are after.

When considering alternative phrasing we may want to consider that the other 
parameters (cumulative message size, or message count) may later be added back 
into this feature.



> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1981) Make log compaction point configurable

2016-06-06 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317588#comment-15317588
 ] 

Eric Wasserman edited comment on KAFKA-1981 at 6/7/16 12:56 AM:


During the KIP-58 vote it was 
[suggested](http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3ccabtagwebxsrveok-unuptjtsdf+d+pq8fuaahql+u9bgaz3...@mail.gmail.com%3e)
 the name of the sole remaining property be changed from:

log.cleaner.min.compaction.lag.ms

to

log.cleaner.compaction.delay.ms

The feature makes a guarantee that the elapsed time between adding a message 
and its being subject to compaction is _at minimum_ _*x*_ number of 
milliseconds. This setting is specifying _*x*_.

In particular this guarantee does not really affect *when* a compaction will or 
will not happen. It only controls which messages will be protected from 
compaction in the event one occurs.

New Oxford American Dictionary defines:

**Lag** n. (also time lag) a period of time between one event or phenomenon and 
another: there was a time lag between the commission of the crime and its 
reporting to the police.

**Delay** n. a period of time by which something is late or postponed: a 
two-hour delay | long delays in obtaining passports.

Seems to me "lag" is closer than "delay" to the meaning we are after.

When considering alternative phrasing we may want to consider that the other 
parameters (cumulative message size, or message count) may later be added back 
into this feature.




was (Author: ewasserman):
During the KIP-58 vote it was 
[suggested](http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3ccabtagwebxsrveok-unuptjtsdf+d+pq8fuaahql+u9bgaz3...@mail.gmail.com%3e)
 the name of the sole remaining property be changed from:

log.cleaner.min.compaction.lag.ms

to

log.cleaner.compaction.delay.ms

The feature makes a guarantee that the elapsed time between adding a message 
and its being subject to compaction is _at minimum_ _**x**_ number of 
milliseconds. This setting is specifying _**x**_.

In particular this guarantee does not really affect *when* a compaction will or 
will not happen. It only controls which messages will be protected from 
compaction in the event one occurs.

New Oxford American Dictionary defines:

**Lag** n. (also time lag) a period of time between one event or phenomenon and 
another: there was a time lag between the commission of the crime and its 
reporting to the police.

**Delay** n. a period of time by which something is late or postponed: a 
two-hour delay | long delays in obtaining passports.

Seems to me "lag" is closer than "delay" to the meaning we are after.

When considering alternative phrasing we may want to consider that the other 
parameters (cumulative message size, or message count) may later be added back 
into this feature.



> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-06-06 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317588#comment-15317588
 ] 

Eric Wasserman commented on KAFKA-1981:
---

During the KIP-58 vote it was 
[suggested](http://mail-archives.apache.org/mod_mbox/kafka-dev/201605.mbox/%3ccabtagwebxsrveok-unuptjtsdf+d+pq8fuaahql+u9bgaz3...@mail.gmail.com%3e)
 the name of the sole remaining property be changed from:

log.cleaner.min.compaction.lag.ms

to

log.cleaner.compaction.delay.ms

The feature makes a guarantee that the elapsed time between adding a message 
and its being subject to compaction is _at minimum_ _**x**_ number of 
milliseconds. This setting is specifying _**x**_.

In particular this guarantee does not really affect *when* a compaction will or 
will not happen. It only controls which messages will be protected from 
compaction in the event one occurs.

New Oxford American Dictionary defines:

**Lag** n. (also time lag) a period of time between one event or phenomenon and 
another: there was a time lag between the commission of the crime and its 
reporting to the police.

**Delay** n. a period of time by which something is late or postponed: a 
two-hour delay | long delays in obtaining passports.

Seems to me "lag" is closer than "delay" to the meaning we are after.

When considering alternative phrasing we may want to consider that the other 
parameters (cumulative message size, or message count) may later be added back 
into this feature.



> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-05-16 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285617#comment-15285617
 ] 

Eric Wasserman commented on KAFKA-1981:
---

Thanks, I added your use case. I would like to start discussion of the KIP. 
I'll attempt to started a discussion thread on dev@kafka.apache.org which from 
looking at the archives seems to be the right venue (though I may not have 
permissions for starting a thread).

> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-05-08 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275920#comment-15275920
 ] 

Eric Wasserman commented on KAFKA-1981:
---

That did it. Thanks. I created 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-58+-+Make+Log+Compaction+Point+Configurable


> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-05-06 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275024#comment-15275024
 ] 

Eric Wasserman commented on KAFKA-1981:
---

I think it must be the same as my Jira user: ewasserman

> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-05-06 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274751#comment-15274751
 ] 

Eric Wasserman commented on KAFKA-1981:
---

I attached a KIP to this Jira ticket as I don't have permissions to add one in 
Confluence.

> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1981) Make log compaction point configurable

2016-04-20 Thread Eric Wasserman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Wasserman updated KAFKA-1981:
--
Attachment: KIP for Kafka Compaction Patch.md

Attached a KIP as I cannot add one in Confluence

> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
> Attachments: KIP for Kafka Compaction Patch.md
>
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1981) Make log compaction point configurable

2016-04-20 Thread Eric Wasserman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250643#comment-15250643
 ] 

Eric Wasserman commented on KAFKA-1981:
---

I apparently don't have any privileges to create a KIP in confluence (no 
"Create" button). My user name is: ewasserman. How should I submit it?

> Make log compaction point configurable
> --
>
> Key: KAFKA-1981
> URL: https://issues.apache.org/jira/browse/KAFKA-1981
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Jay Kreps
>  Labels: newbie++
>
> Currently if you enable log compaction the compactor will kick in whenever 
> you hit a certain "dirty ratio", i.e. when 50% of your data is uncompacted. 
> Other than this we don't give you fine-grained control over when compaction 
> occurs. In addition we never compact the active segment (since it is still 
> being written to).
> Other than this we don't really give you much control over when compaction 
> will happen. The result is that you can't really guarantee that a consumer 
> will get every update to a compacted topic--if the consumer falls behind a 
> bit it might just get the compacted version.
> This is usually fine, but it would be nice to make this more configurable so 
> you could set either a # messages, size, or time bound for compaction.
> This would let you say, for example, "any consumer that is no more than 1 
> hour behind will get every message."
> This should be relatively easy to implement since it just impacts the 
> end-point the compactor considers available for compaction. I think we 
> already have that concept, so this would just be some other overrides to add 
> in when calculating that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)