Hi Jun,

-:  1. I guess both new configurations will be at the topic level?

They will exist in the global configuration, at the very least.
I would like to have them on the topic level as well, but there is an 
inconsistency between the cleanup/compaction properties that exist “only 
globally” vs “globally + per topic”.
I haven’t gotten around to investigating why, and if that reason would then 
also impact the properties I’m suggesting. At first glance they seem to belong 
with the properties that are "only globally” configured, but Guozhang has 
written earlier with a suggestion of a compaction property that works for both 
(though I haven’t had the time to look into it yet, unfortunately).

-:  2. Since the log cleaner now needs to keep both the offset and another long 
(say timestamp) in the de-dup map, it reduces the number of keys that we can 
keep in the map and thus may require more rounds of cleaning. This is probably 
not a big issue, but it will be useful to document this impact in the KIP.

As a reader, I tend to prefer brief documentation on new features (they tend to 
be too many for me to find the willpower to read a 200-page essay about each 
one), so that influences me to avoid writing about every micro-impact that may 
exist, and simply leave it inferred (as you have just done).
But I also don’t feel strongly enough about it to argue either way. So, after 
reading my argument, if you still insist, I’ll happily add this there.

-: 3. With the new cleaning strategy, we want to be a bit careful with removing 
the last message in a partition (which is possible now). We need to preserve 
the offset of the last message so that we don't reuse the offset for a 
different message. One way to simply never remove the last message. Another way 
(suggested by Jason) is to create an empty message batch.

That is a good point, but isn’t the last message always kept regardless? In all 
of my tests with this approach, never have I seen it being removed. This is not 
because I made it so while changing the code, it was simply like this before, 
which made me smile!
Given these results, I just *assumed* (oops) that these scenarios represented 
the reality, so the compaction would only affected the “tail”, while the “head” 
remained untouched. Now that you say its possible that the last message 
actually gets overwritten somehow, I guess a new bullet point will have to be 
added to the KIP for this (after I’ve found the time to review the portion of 
the code that enacts this behaviour).

Kind Regards,
Luís Cabral

From: Jun Rao
Sent: 03 July 2018 23:58
To: dev
Subject: Re: [VOTE] KIP-280: Enhanced log compaction

Hi, Luis,

Thanks for the KIP. Overall, this seems a useful KIP. A few comments below.

1. I guess both new configurations will be at the topic level?
2. Since the log cleaner now needs to keep both the offset and another long
(say timestamp) in the de-dup map, it reduces the number of keys that we
can keep in the map and thus may require more rounds of cleaning. This is
probably not a big issue, but it will be useful to document this impact in
the KIP.
3. With the new cleaning strategy, we want to be a bit careful with
removing the last message in a partition (which is possible now). We need
to preserve the offset of the last message so that we don't reuse the
offset for a different message. One way to simply never remove the last
message. Another way (suggested by Jason) is to create an empty message
batch.

Jun

On Sat, Jun 9, 2018 at 12:39 AM, Luís Cabral <luis_cab...@yahoo.com.invalid>
wrote:

> Hi all,
>
> Any takers on having a look at this KIP and voting on it?
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 280%3A+Enhanced+log+compaction
>
> Cheers,
> Luis
>

Reply via email to