Hi Jun, -: 1. I guess both new configurations will be at the topic level?
They will exist in the global configuration, at the very least. I would like to have them on the topic level as well, but there is an inconsistency between the cleanup/compaction properties that exist “only globally” vs “globally + per topic”. I haven’t gotten around to investigating why, and if that reason would then also impact the properties I’m suggesting. At first glance they seem to belong with the properties that are "only globally” configured, but Guozhang has written earlier with a suggestion of a compaction property that works for both (though I haven’t had the time to look into it yet, unfortunately). -: 2. Since the log cleaner now needs to keep both the offset and another long (say timestamp) in the de-dup map, it reduces the number of keys that we can keep in the map and thus may require more rounds of cleaning. This is probably not a big issue, but it will be useful to document this impact in the KIP. As a reader, I tend to prefer brief documentation on new features (they tend to be too many for me to find the willpower to read a 200-page essay about each one), so that influences me to avoid writing about every micro-impact that may exist, and simply leave it inferred (as you have just done). But I also don’t feel strongly enough about it to argue either way. So, after reading my argument, if you still insist, I’ll happily add this there. -: 3. With the new cleaning strategy, we want to be a bit careful with removing the last message in a partition (which is possible now). We need to preserve the offset of the last message so that we don't reuse the offset for a different message. One way to simply never remove the last message. Another way (suggested by Jason) is to create an empty message batch. That is a good point, but isn’t the last message always kept regardless? In all of my tests with this approach, never have I seen it being removed. This is not because I made it so while changing the code, it was simply like this before, which made me smile! Given these results, I just *assumed* (oops) that these scenarios represented the reality, so the compaction would only affected the “tail”, while the “head” remained untouched. Now that you say its possible that the last message actually gets overwritten somehow, I guess a new bullet point will have to be added to the KIP for this (after I’ve found the time to review the portion of the code that enacts this behaviour). Kind Regards, Luís Cabral From: Jun Rao Sent: 03 July 2018 23:58 To: dev Subject: Re: [VOTE] KIP-280: Enhanced log compaction Hi, Luis, Thanks for the KIP. Overall, this seems a useful KIP. A few comments below. 1. I guess both new configurations will be at the topic level? 2. Since the log cleaner now needs to keep both the offset and another long (say timestamp) in the de-dup map, it reduces the number of keys that we can keep in the map and thus may require more rounds of cleaning. This is probably not a big issue, but it will be useful to document this impact in the KIP. 3. With the new cleaning strategy, we want to be a bit careful with removing the last message in a partition (which is possible now). We need to preserve the offset of the last message so that we don't reuse the offset for a different message. One way to simply never remove the last message. Another way (suggested by Jason) is to create an empty message batch. Jun On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral <luis_cab...@yahoo.com.invalid> wrote: > Hi all, > > Any takers on having a look at this KIP and voting on it? > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 280%3A+Enhanced+log+compaction > > Cheers, > Luis >