RE: [DISCUSS] KIP-280: Enhanced log compaction

Luís Cabral Sun, 08 Apr 2018 03:45:07 -0700

Hi Matthias,

bq. Why do we need two new configs? Why is the topic config 
`compaction.strategy` not sufficient?

As I understand these configurations, one allows you to configure the default 
for all topics while the other allows you to configure a single topic directly.
If this is incorrect, or if having a global toggle is not desired, then I have 
no issues with having only the topic-relevant configuration.

bq. For Kafka Streams we did think about a timestamp base compaction at some 
point (internal brain storming)---we never thought this through in details, but 
it might be a good idea to discuss it in this KIP and maybe piggy-back it if we 
want it (as a second pre-defined strategy "timestamp" next to "offset"?)

The reason why I went for a “long” value here was mainly to support the 2 most 
common versioning patterns around: incremental numerals and timestamp (long 
representing milliseconds since 0h, January 1, 1970 GMT).
Is this not enough to represent the strategy you guys had in mind? I would love 
to hear more about those discussions so this KIP can fulfil some more 
requirements that I am not aware of at the moment.

bq. With the header approach it is not ensured that each record uses a unique 
"compaction value" (in contrast to offsets). Thus, what should the behaviour 
be, if two messages have the same "compaction value" in the header? (For 
timestamps, there is the same issue, and one idea was to use the offset as 
tie-breaker)

Sorry, I forgot to mention that in the KIP. In the pull request used with the 
KIP you can see that it is indeed using the offset as a tie-breaker in case the 
header values are the same.
I’ll make this clear by adding it as part of the proposed changes.

bq. What should the behaviour be, if a message does not encode the "compaction 
key" in the header?

The intention is that if both records being compared don’t have this value, 
then the offset is used instead. However, if only one of these records doesn’t 
have it, then whichever record has a “compaction key” is kept (as the other is 
considered to be anomalous).
I’ll also add this to the proposed changes in the KIP to highlight these 
fall-back behaviours.

Thank you for the feedback and looking forward for more replies!

Cheers

From: Matthias J. Sax
Sent: 08 April 2018 05:29
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction

Luís,

thanks a lot for this KIP. Very interesting idea.

Couple of questions:

 - Why do we need two new configs? Why is the topic config
`compaction.strategy` not sufficient?

 - For Kafka Streams we did think about a timestamp base compaction at
some point (internal brain storming)---we never thought this through in
details, but it might be a good idea to discuss it in this KIP and maybe
piggy-back it if we want it (as a second pre-defined strategy
"timestamp" next to "offset"?)

 - With the header approach it is not ensured that each record uses a
unique "compaction value" (in contrast to offsets). Thus, what should
the behavior be, if two messages have the same "compaction value" in the
header? (For timestamps, there is the same issue, and one idea was to
use the offset as tie-breaker)

 - What should the behavior be, if a message does not encode the
"compaction key" in the header?

-Matthias

On 4/5/18 11:59 PM, Luís Cabral wrote:
>  
> Thank you very much for taking the time to read it.
> 
> bq. In the 'Proposed Changes' section, can you expand 'OCC' ?
> I've made the 'OCC' into a link pointing to the appropriate Wiki page 
> explaining what it is. This is not a particularly important part of the 
> change, it is just to reference the similarity between this proposal and the 
> version control offered by OCC.
> 
> bq. Is it possible to enumerate the keys ?
> Do you mean hard-coding the header key used, rather than using a free-text 
> solution? If I were to hard-code header with key "version", for example, then 
> this may conflict with other clients that already use this header for 
> something else, making it cumbersome for them to try and use this strategy, 
> should they want it.
> If I misunderstood your points, then please correct me. I appreciate the 
> feedback!    On Thursday, April 5, 2018, 5:13:47 PM GMT+2, Ted Yu 
> <yuzhih...@gmail.com> wrote:  
>  
>  In the 'Proposed Changes' section, can you expand 'OCC' ?
> 
> bq. Specifically changing this to anything other than "*offset*"
> 
> Is it possible to enumerate the keys ? In the future, more metadata would
> be defined in record header - it is better to avoid collision.
> 
> Cheers
> 
> On Thu, Apr 5, 2018 at 2:05 AM, Luís Cabral <luis_cab...@yahoo.com.invalid>
> wrote:
> 
>>
>> This is embarassingly hard to fix... going again...
>> ----
>> KIP-280:  https://cwiki.apache.org/confluence/display/
>> KAFKA/KIP-280%3A+Enhanced+log+compaction
>> -----
>> Pull-4822:  https://github.com/apache/kafka/pull/4822
>>
>>
>>     On Thursday, April 5, 2018, 11:03:22 AM GMT+2, Luís Cabral
>> <luis_cab...@yahoo.com.INVALID> wrote:
>>
>>   Fixing the links:KIP-280:  https://cwiki.apache.org/confluence/display/
>> KAFKA/KIP-280%3A+Enhanced+log+compactionPull-4822:  https://
>> github.com/apache/kafka/pull/4822
>>
>>
>> On 2018/04/0508:44:00, Luís Cabral <l...@yahoo.com.INVALID> wrote:
>>> Helloall,>
>>> Starting adiscussion for this feature.>
>>> KIP-280  :  https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 280%3A+Enhanced+log+compactionPull-4822:  https://github.com/apache/
>> kafka/pull/4822>
>>
>>> KindRegards,Luís>
>>
>>

RE: [DISCUSS] KIP-280: Enhanced log compaction

Reply via email to