Thanks Bruno, I've made those changes. I'll call a vote on the KIP later today.
Regards, Nick Telford On Wed, 12 Jan 2022 at 12:13, Bruno Cadonna <cado...@apache.org> wrote: > Hi Nick, > > Great! > > I think the KIP is ready for voting. I have just a couple of minor > comments. > > a. > In the config description, I would replace > > "Purging will occur after at least this value since the last purge, but > may be delayed until later in order to meet the processing guarantee." > > with > > "Purging will occur after at least this value since the last purge, but > may be delayed until later." > > I do not really understand what you mean with "meet the processing > guarantee" and I think it suffices to say that the purge might be delayed. > > > b. > I would change the title to > > "Add config min.repartition.purge.interval.ms to Kafka Streams" > > > c. > The current rejected alternative leaks implementation details since it > refers to the coupling of purge to commit and we agreed to leave that > out of the KIP. > > > d. > Could you add my proposal to specify the config as a multiple of the > commit interval to the rejected alternatives with the reason why we > discarded it? > > For the rest, I am +1. > > Best, > Bruno > > > > On 11.01.22 16:47, Nick Telford wrote: > > Hi Bruno, > > > > Thanks for your recommendations. > > > > I've made those changes, and also updated the description for the new > > config to read: "The minimum frequency in milliseconds with which to > delete > > fully consumed records from repartition topics. *Purging will occur after > > at least this value since the last purge, but may be delayed until later > in > > order to meet the processing guarantee.* The default value is the same as > > the default for commit.interval.ms (30000). (Note, unlike > > commit.interval.ms, the default for this value remains unchanged when > > processing.guarantee is set to exactly_once_v2)." > > > > This should make it clear that this is just a minimum interval, without > > leaking too much detail in to the specification. > > > > If there are no other issues, I'll put this to a vote. > > > > Regards, > > > > Nick Telford > > > > On Tue, 11 Jan 2022 at 15:34, Bruno Cadonna <cado...@apache.org> wrote: > > > >> Hi Nick, > >> > >> Sorry for the delay! > >> > >> Regarding point 7, I am fine with keeping the config as an interval and > >> keeping it independently of the commit interval. However, I would then > >> remove the following paragraph from the KIP: > >> > >> "We will still wait for a commit before explicitly deleting repartition > >> records, but we will only do so if the time since the last record > >> deletion is at least repartition.purge.interval.ms. This means the > >> lower-bound for repartition.purge.interval.ms is effectively capped by > >> the value of commit.interval.ms." > >> > >> The reason is that in the previous paragraph you say that the configs > >> can be modified separately and then in this paragraph you bind the purge > >> interval to the commit interval. This seems a contradiction and > >> indicates that you are leaking too much implementation details into the > >> KIP. I think, just saying that the purge interval is a minimum and name > >> it accordingly without talking about the actual implementation makes the > >> KIP more robust against future implementation changes. > >> > >> My proposal for the config name is min.repartition.purge.interval.ms or > >> even min.purge.interval.ms with a preference for the former. > >> > >> Best, > >> Bruno > >> > >> > >> > >> On 04.01.22 17:21, Nick Telford wrote: > >>>> > >>>> You missed one "delete.interval.ms" in the last paragraph in section > >>>> "Proposed Changes". > >>>> > >>> > >>> I've fixed these now, including the title that I somehow missed! > >>> > >>> I am afraid, I again need to comment on point 7. IMO, it does not make > >>> > >>> sense to be able to tune repartition.purge.interval.ms and > >>> > >>> commit.interval.ms separately when the purge can only happen during a > >>> > >>> commit. For example, if I set commit.interval.ms to 30000 ms and > >>> > >>> repartition.purge.interval.ms to 35000 ms, the records will be purged > at > >>> > >>> every second commit, i.e., every 60000 ms. What benefit do users have > to > >>> > >>> set repartition.purge.interval.ms separately from commit.interval.ms? > >>> > >>> The rate of purging will never be 1 / 35000, the rate will be 1 / > >>> > >>> 2*commit.interval.ms.. > >>> > >>> > >>> Could we address this by chaning the name of the configuration to > >> something > >>> like "repartition.purge.min.interval.ms", to indicate that the > >> repartition > >>> purge interval will be *at least* this value? > >>> > >>> If that's still not suitable, are there any other existing > configurations > >>> that behave in a similar way, i.e. dictate a multiple of another > >> interval, > >>> that we could use as a basis for a new name for this configuration? > >>> > >>> Additionally, I have a new point. > >>> > >>> 8. If user code has access to the processor context (e.g. in the > >>> > >>> processor API), a commit can also be requested on demand by user code. > >>> > >>> The KIP should clarify if purges might also happen during requested > >>> > >>> commits or if purges only happen during automatic commits. > >>> > >>> > >>> Good point. I'm holding off on amending this for now until we agree on > an > >>> outcome for point 7 above, because I suspect it may at least influence > >> the > >>> wording of this section. > >>> > >>> Thanks for the feedback, and sorry for the delay. Now that the holidays > >> are > >>> over I'm able to focus on this again. > >>> > >>> Regards, > >>> > >>> Nick Telford > >>> > >>> On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <cado...@apache.org> > wrote: > >>> > >>>> Hi Nick, > >>>> > >>>> Thanks for the updates! > >>>> > >>>> The motivation section and the description of the new config is much > >>>> clearer and informative now! > >>>> > >>>> You missed one "delete.interval.ms" in the last paragraph in section > >>>> "Proposed Changes". > >>>> > >>>> I am afraid, I again need to comment on point 7. IMO, it does not make > >>>> sense to be able to tune repartition.purge.interval.ms and > >>>> commit.interval.ms separately when the purge can only happen during a > >>>> commit. For example, if I set commit.interval.ms to 30000 ms and > >>>> repartition.purge.interval.ms to 35000 ms, the records will be purged > >> at > >>>> every second commit, i.e., every 60000 ms. What benefit do users have > to > >>>> set repartition.purge.interval.ms separately from commit.interval.ms? > >>>> The rate of purging will never be 1 / 35000, the rate will be 1 / > >>>> 2*commit.interval.ms.. > >>>> > >>>> Additionally, I have a new point. > >>>> 8. If user code has access to the processor context (e.g. in the > >>>> processor API), a commit can also be requested on demand by user code. > >>>> The KIP should clarify if purges might also happen during requested > >>>> commits or if purges only happen during automatic commits. > >>>> > >>>> Best, > >>>> Bruno > >>>> > >>>> On 21.12.21 20:40, Nick Telford wrote: > >>>>> Hi everyone, > >>>>> > >>>>> Thanks for your feedback. I've made the suggested changes to the KIP > >> (1, > >>>> 2, > >>>>> 3 and 5). > >>>>> > >>>>> For the new name, I've chosen repartition.purge.interval.ms, as I > felt > >>>> it > >>>>> struck a good balance between being self-descriptive and concise. > >> Please > >>>>> let me know if you'd prefer something else. > >>>>> > >>>>> On point 6: My PR has basic validation to ensure the value is > positive, > >>>> but > >>>>> I don't think it's necessary to have dynamic validation, to ensure > it's > >>>> not > >>>>> less than commit.interval.ms. The reason is that it will be > implicitly > >>>>> limited to that value anyway, and won't break anything. But I can add > >> it > >>>> if > >>>>> you'd prefer it. > >>>>> > >>>>> On point 7: I worry that defaulting it to follow the value of > >>>>> commit.interval.ms may confuse users, who will likely expect the > >>>> default to > >>>>> not be affected by changes to other configuration options. I can see > >> the > >>>>> appeal of retaining the existing behaviour (following the commit > >>>> interval) > >>>>> by default, but I believe that the majority of users who customize > >>>>> commit.interval.ms do not desire a different frequency of > repartition > >>>>> record purging as well. > >>>>> > >>>>> As for multiples of commit interval: I think the argument against > that > >> is > >>>>> that an interval is more intuitive when given as a time, rather than > >> as a > >>>>> multiple of some other operation. Users configuring this should not > >> need > >>>> to > >>>>> break out a calculator to work out how frequently the records are > going > >>>> to > >>>>> be purged! > >>>>> > >>>>> I've also updated the PR with the relevant changes. > >>>>> > >>>>> BTW, for some reason I didn't receive Sophie's email. I'll > periodically > >>>>> check the thread on the archive to ensure I don't miss any more of > your > >>>>> messages! > >>>>> > >>>>> Regards, > >>>>> > >>>>> Nick > >>>>> > >>>>> On Tue, 21 Dec 2021 at 12:34, Luke Chen <show...@gmail.com> wrote: > >>>>> > >>>>>> Thanks, Bruno. > >>>>>> > >>>>>> I agree my point 4 is more like PR discussion, not KIP discussion. > >>>>>> @Nick , please update my point 4 in PR directly. > >>>>>> > >>>>>> Thank you. > >>>>>> Luke > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <cado...@apache.org> > >>>> wrote: > >>>>>> > >>>>>>> Hi Nick, > >>>>>>> > >>>>>>> Thank you for the KIP! > >>>>>>> > >>>>>>> I agree on points 1, 2, and 3. I am not sure about point 4. I agree > >>>> that > >>>>>>> we should update the docs for commit.interval.ms but I am not sure > >> if > >>>>>>> this needs to mentioned in the KIP. That seems to me more a PR > >>>>>>> discussion. Also on point 2, I agree that we need to add a doc > string > >>>>>>> but the content should be exemplary not binding. What I want to say > >> is > >>>>>>> that, we do not need a KIP to change docs. > >>>>>>> > >>>>>>> Here my points: > >>>>>>> > >>>>>>> 5. Could you specify in the motivation that the KIP is about > deleting > >>>>>>> records from repartition topics? Maybe with a short description > when > >>>> why > >>>>>>> and when records are deleted from the repartition topics. For us it > >>>>>>> might be clear, but IMO we should try to write KIPs so that someone > >>>> that > >>>>>>> is relatively new to Kafka Streams can understand the KIP without > >>>>>>> needing to know too much background. > >>>>>>> > >>>>>>> 6. Does the config need to be validated? For example, does > >>>>>>> delete.interval.ms need to be greater than or equal to > >>>>>> commit.interval.ms? > >>>>>>> > >>>>>>> 7. Should the default value for non-EOS be 30s or the same value as > >>>>>>> commit.interval.ms? I am just thinking about the case where a user > >>>>>>> explicitly changes commit.interval.ms but not delete.interval.ms > (or > >>>>>>> whatever name you come up for it). Once delete.interval.ms is set > >>>>>>> explicitly it is decoupled from commit.interval.ms. Similar could > be > >>>>>>> done for the EOS case. > >>>>>>> Alternatively, we could also define delete.interval.ms to take a > >>>>>>> integral number without a unit that specifies after how many commit > >>>>>>> intervals the records in repartition topics should be deleted. This > >>>>>>> would make sense since delete.interval.ms is tightly bound to > >>>>>>> commit.interval.ms. Additionally, it would make the semantics of > the > >>>>>>> config simpler. The name of the config should definitely change if > we > >>>> go > >>>>>>> down this way. > >>>>>>> > >>>>>>> Best, > >>>>>>> Bruno > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 21.12.21 11:14, Luke Chen wrote: > >>>>>>>> Hi Nick, > >>>>>>>> > >>>>>>>> Thanks for the KIP! > >>>>>>>> > >>>>>>>> In addition to Sophie's comments, I have one more to this KIP: > >>>>>>>> 3. I think you should mention the behavior change *explicitly* in > >>>>>>>> "Compatibility" section. I know you already mentioned it in KIP, > in > >>>> the > >>>>>>>> benefit way. But I think in this section, we should clearly point > >> out > >>>>>>> which > >>>>>>>> behavior will be change after this KIP. That is, you should put it > >>>>>> clear > >>>>>>>> that the delete record interval will change from 100ms to 30s with > >> EOS > >>>>>>>> enabled. And it should also be mentioned in doc/upgrade.html doc. > >>>>>>>> 4. Since this new config has some relationship with > >>>> commit.interval.ms > >>>>>> , > >>>>>>> I > >>>>>>>> think we should also update the doc description for ` > >>>>>> commit.interval.ms > >>>>>>> `, > >>>>>>>> to let user know there's another config to control delete interval > >> and > >>>>>>>> should be greater than commit.interval.ms. Something like that. > >> WDYT? > >>>>>>> (You > >>>>>>>> should put this change in the KIP as Sophie mentioned) > >>>>>>>> > >>>>>>>> Thank you. > >>>>>>>> Luke > >>>>>>>> > >>>>>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman > >>>>>>>> <sop...@confluent.io.invalid> wrote: > >>>>>>>> > >>>>>>>>> Hey Nick, > >>>>>>>>> > >>>>>>>>> I think you forgot to link to the KIP document, but I take it > this > >> is > >>>>>>>>> it: KIP-811: > >>>>>>>>> Add separate delete.interval.ms to Kafka Streams > >>>>>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw> > >>>>>>>>> > >>>>>>>>> The overall proposal sounds good to me, just a few minor things: > >>>>>>>>> > >>>>>>>>> 1. Please specify everything needed to define this config > >>>>>>> explicitly, ie > >>>>>>>>> all the arguments that will be passed in to the > >>>>>>>>> StreamsConfig's ConfigDef: in addition to the default > value, > >> we > >>>>>>> need the > >>>>>>>>> config type (presumably a Long), the doc > >>>>>>>>> string, and the importance (probably "low", similar to > >>>>>>>>> commit.interval.ms > >>>>>>>>> ) > >>>>>>>>> 2. Might be worth considering a slightly more descriptive > >> name > >>>> for > >>>>>>> this > >>>>>>>>> config. Most users probably don't think about, > >>>>>>>>> or may not even be aware of, the deletion of consumed > >> records by > >>>>>>> Kafka > >>>>>>>>> Streams, so calling it something along > >>>>>>>>> the lines of "repartition.records.delete.interval.ms" or > " > >>>>>>>>> consumed.records.deletion.interval.ms" or so on will help > >>>>>>>>> make it clear what the config refers to and whether or not > >> they > >>>>>>> need to > >>>>>>>>> care. Maybe you can come up with better > >>>>>>>>> and/or shorter names, just wanted to suggest some example > >> names > >>>>>>> that I > >>>>>>>>> think sufficiently get the point across > >>>>>>>>> > >>>>>>>>> Other than that I'm +1 -- thanks for the KIP! > >>>>>>>>> > >>>>>>>>> Sophie > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford < > >> nick.telf...@gmail.com > >>>>> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> This is a KIP for a proposed solution to KAFKA-13549 > >>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The > solution > >>>> is > >>>>>>>>> very > >>>>>>>>>> simple, so the KIP is pretty short. > >>>>>>>>>> > >>>>>>>>>> The suggested changes are implemented by this PR > >>>>>>>>>> <https://github.com/apache/kafka/pull/11610>. > >>>>>>>>>> -- > >>>>>>>>>> Nick Telford > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > >