Hi Matthias,
bq. Why do we need two new configs? Why is the topic config `compaction.strategy` not sufficient? As I understand these configurations, one allows you to configure the default for all topics while the other allows you to configure a single topic directly. If this is incorrect, or if having a global toggle is not desired, then I have no issues with having only the topic-relevant configuration. bq. For Kafka Streams we did think about a timestamp base compaction at some point (internal brain storming)---we never thought this through in details, but it might be a good idea to discuss it in this KIP and maybe piggy-back it if we want it (as a second pre-defined strategy "timestamp" next to "offset"?) The reason why I went for a “long” value here was mainly to support the 2 most common versioning patterns around: incremental numerals and timestamp (long representing milliseconds since 0h, January 1, 1970 GMT). Is this not enough to represent the strategy you guys had in mind? I would love to hear more about those discussions so this KIP can fulfil some more requirements that I am not aware of at the moment. bq. With the header approach it is not ensured that each record uses a unique "compaction value" (in contrast to offsets). Thus, what should the behaviour be, if two messages have the same "compaction value" in the header? (For timestamps, there is the same issue, and one idea was to use the offset as tie-breaker) Sorry, I forgot to mention that in the KIP. In the pull request used with the KIP you can see that it is indeed using the offset as a tie-breaker in case the header values are the same. I’ll make this clear by adding it as part of the proposed changes. bq. What should the behaviour be, if a message does not encode the "compaction key" in the header? The intention is that if both records being compared don’t have this value, then the offset is used instead. However, if only one of these records doesn’t have it, then whichever record has a “compaction key” is kept (as the other is considered to be anomalous). I’ll also add this to the proposed changes in the KIP to highlight these fall-back behaviours. Thank you for the feedback and looking forward for more replies! Cheers From: Matthias J. Sax Sent: 08 April 2018 05:29 To: dev@kafka.apache.org Subject: Re: [DISCUSS] KIP-280: Enhanced log compaction Luís, thanks a lot for this KIP. Very interesting idea. Couple of questions: - Why do we need two new configs? Why is the topic config `compaction.strategy` not sufficient? - For Kafka Streams we did think about a timestamp base compaction at some point (internal brain storming)---we never thought this through in details, but it might be a good idea to discuss it in this KIP and maybe piggy-back it if we want it (as a second pre-defined strategy "timestamp" next to "offset"?) - With the header approach it is not ensured that each record uses a unique "compaction value" (in contrast to offsets). Thus, what should the behavior be, if two messages have the same "compaction value" in the header? (For timestamps, there is the same issue, and one idea was to use the offset as tie-breaker) - What should the behavior be, if a message does not encode the "compaction key" in the header? -Matthias On 4/5/18 11:59 PM, Luís Cabral wrote: > > Thank you very much for taking the time to read it. > > bq. In the 'Proposed Changes' section, can you expand 'OCC' ? > I've made the 'OCC' into a link pointing to the appropriate Wiki page > explaining what it is. This is not a particularly important part of the > change, it is just to reference the similarity between this proposal and the > version control offered by OCC. > > bq. Is it possible to enumerate the keys ? > Do you mean hard-coding the header key used, rather than using a free-text > solution? If I were to hard-code header with key "version", for example, then > this may conflict with other clients that already use this header for > something else, making it cumbersome for them to try and use this strategy, > should they want it. > If I misunderstood your points, then please correct me. I appreciate the > feedback! On Thursday, April 5, 2018, 5:13:47 PM GMT+2, Ted Yu > <yuzhih...@gmail.com> wrote: > > In the 'Proposed Changes' section, can you expand 'OCC' ? > > bq. Specifically changing this to anything other than "*offset*" > > Is it possible to enumerate the keys ? In the future, more metadata would > be defined in record header - it is better to avoid collision. > > Cheers > > On Thu, Apr 5, 2018 at 2:05 AM, Luís Cabral <luis_cab...@yahoo.com.invalid> > wrote: > >> >> This is embarassingly hard to fix... going again... >> ---- >> KIP-280: https://cwiki.apache.org/confluence/display/ >> KAFKA/KIP-280%3A+Enhanced+log+compaction >> ----- >> Pull-4822: https://github.com/apache/kafka/pull/4822 >> >> >> On Thursday, April 5, 2018, 11:03:22 AM GMT+2, Luís Cabral >> <luis_cab...@yahoo.com.INVALID> wrote: >> >> Fixing the links:KIP-280: https://cwiki.apache.org/confluence/display/ >> KAFKA/KIP-280%3A+Enhanced+log+compactionPull-4822: https:// >> github.com/apache/kafka/pull/4822 >> >> >> On 2018/04/0508:44:00, Luís Cabral <l...@yahoo.com.INVALID> wrote: >>> Helloall,> >>> Starting adiscussion for this feature.> >>> KIP-280 : https://cwiki.apache.org/confluence/display/KAFKA/KIP- >> 280%3A+Enhanced+log+compactionPull-4822: https://github.com/apache/ >> kafka/pull/4822> >> >>> KindRegards,Luís> >> >>