Thanks Bruno, I've made those changes.

I'll call a vote on the KIP later today.

Regards,

Nick Telford

On Wed, 12 Jan 2022 at 12:13, Bruno Cadonna <cado...@apache.org> wrote:

> Hi Nick,
>
> Great!
>
> I think the KIP is ready for voting. I have just a couple of minor
> comments.
>
> a.
> In the config description, I would replace
>
> "Purging will occur after at least this value since the last purge, but
> may be delayed until later in order to meet the processing guarantee."
>
> with
>
> "Purging will occur after at least this value since the last purge, but
> may be delayed until later."
>
> I do not really understand what you mean with "meet the processing
> guarantee" and I think it suffices to say that the purge might be delayed.
>
>
> b.
> I would change the title to
>
> "Add config min.repartition.purge.interval.ms to Kafka Streams"
>
>
> c.
> The current rejected alternative leaks implementation details since it
> refers to the coupling of purge to commit and we agreed to leave that
> out of the KIP.
>
>
> d.
> Could you add my proposal to specify the config as a multiple of the
> commit interval to the rejected alternatives with the reason why we
> discarded it?
>
> For the rest, I am +1.
>
> Best,
> Bruno
>
>
>
> On 11.01.22 16:47, Nick Telford wrote:
> > Hi Bruno,
> >
> > Thanks for your recommendations.
> >
> > I've made those changes, and also updated the description for the new
> > config to read: "The minimum frequency in milliseconds with which to
> delete
> > fully consumed records from repartition topics. *Purging will occur after
> > at least this value since the last purge, but may be delayed until later
> in
> > order to meet the processing guarantee.* The default value is the same as
> > the default for commit.interval.ms (30000).  (Note, unlike
> > commit.interval.ms, the default for this value remains unchanged when
> > processing.guarantee is set to exactly_once_v2)."
> >
> > This should make it clear that this is just a minimum interval, without
> > leaking too much detail in to the specification.
> >
> > If there are no other issues, I'll put this to a vote.
> >
> > Regards,
> >
> > Nick Telford
> >
> > On Tue, 11 Jan 2022 at 15:34, Bruno Cadonna <cado...@apache.org> wrote:
> >
> >> Hi Nick,
> >>
> >> Sorry for the delay!
> >>
> >> Regarding point 7, I am fine with keeping the config as an interval and
> >> keeping it independently of the commit interval. However, I would then
> >> remove the following paragraph from the KIP:
> >>
> >> "We will still wait for a commit before explicitly deleting repartition
> >> records, but we will only do so if the time since the last record
> >> deletion is at least repartition.purge.interval.ms. This means the
> >> lower-bound for repartition.purge.interval.ms  is effectively capped by
> >> the value of commit.interval.ms."
> >>
> >> The reason is that in the previous paragraph you say that the configs
> >> can be modified separately and then in this paragraph you bind the purge
> >> interval to the commit interval. This seems a contradiction and
> >> indicates that you are leaking too much implementation details into the
> >> KIP. I think, just saying that the purge interval is a minimum and name
> >> it accordingly without talking about the actual implementation makes the
> >> KIP more robust against future implementation changes.
> >>
> >> My proposal for the config name is min.repartition.purge.interval.ms or
> >> even min.purge.interval.ms with a preference for the former.
> >>
> >> Best,
> >> Bruno
> >>
> >>
> >>
> >> On 04.01.22 17:21, Nick Telford wrote:
> >>>>
> >>>> You missed one "delete.interval.ms" in the last paragraph in section
> >>>> "Proposed Changes".
> >>>>
> >>>
> >>> I've fixed these now, including the title that I somehow missed!
> >>>
> >>> I am afraid, I again need to comment on point 7. IMO, it does not make
> >>>
> >>> sense to be able to tune repartition.purge.interval.ms and
> >>>
> >>> commit.interval.ms separately when the purge can only happen during a
> >>>
> >>> commit. For example, if I set commit.interval.ms to 30000 ms and
> >>>
> >>> repartition.purge.interval.ms to 35000 ms, the records will be purged
> at
> >>>
> >>> every second commit, i.e., every 60000 ms. What benefit do users have
> to
> >>>
> >>> set repartition.purge.interval.ms separately from commit.interval.ms?
> >>>
> >>> The rate of purging will never be 1 / 35000, the rate will be 1 /
> >>>
> >>> 2*commit.interval.ms..
> >>>
> >>>
> >>> Could we address this by chaning the name of the configuration to
> >> something
> >>> like "repartition.purge.min.interval.ms", to indicate that the
> >> repartition
> >>> purge interval will be *at least* this value?
> >>>
> >>> If that's still not suitable, are there any other existing
> configurations
> >>> that behave in a similar way, i.e. dictate a multiple of another
> >> interval,
> >>> that we could use as a basis for a new name for this configuration?
> >>>
> >>> Additionally, I have a new point.
> >>>
> >>> 8. If user code has access to the processor context (e.g. in the
> >>>
> >>> processor API), a commit can also be requested on demand by user code.
> >>>
> >>> The KIP should clarify if purges might also happen during requested
> >>>
> >>> commits or if purges only happen during automatic commits.
> >>>
> >>>
> >>> Good point. I'm holding off on amending this for now until we agree on
> an
> >>> outcome for point 7 above, because I suspect it may at least influence
> >> the
> >>> wording of this section.
> >>>
> >>> Thanks for the feedback, and sorry for the delay. Now that the holidays
> >> are
> >>> over I'm able to focus on this again.
> >>>
> >>> Regards,
> >>>
> >>> Nick Telford
> >>>
> >>> On Wed, 22 Dec 2021 at 09:28, Bruno Cadonna <cado...@apache.org>
> wrote:
> >>>
> >>>> Hi Nick,
> >>>>
> >>>> Thanks for the updates!
> >>>>
> >>>> The motivation section and the description of the new config is much
> >>>> clearer and informative now!
> >>>>
> >>>> You missed one "delete.interval.ms" in the last paragraph in section
> >>>> "Proposed Changes".
> >>>>
> >>>> I am afraid, I again need to comment on point 7. IMO, it does not make
> >>>> sense to be able to tune repartition.purge.interval.ms and
> >>>> commit.interval.ms separately when the purge can only happen during a
> >>>> commit. For example, if I set commit.interval.ms to 30000 ms and
> >>>> repartition.purge.interval.ms to 35000 ms, the records will be purged
> >> at
> >>>> every second commit, i.e., every 60000 ms. What benefit do users have
> to
> >>>> set repartition.purge.interval.ms separately from commit.interval.ms?
> >>>> The rate of purging will never be 1 / 35000, the rate will be 1 /
> >>>> 2*commit.interval.ms..
> >>>>
> >>>> Additionally, I have a new point.
> >>>> 8. If user code has access to the processor context (e.g. in the
> >>>> processor API), a commit can also be requested on demand by user code.
> >>>> The KIP should clarify if purges might also happen during requested
> >>>> commits or if purges only happen during automatic commits.
> >>>>
> >>>> Best,
> >>>> Bruno
> >>>>
> >>>> On 21.12.21 20:40, Nick Telford wrote:
> >>>>> Hi everyone,
> >>>>>
> >>>>> Thanks for your feedback. I've made the suggested changes to the KIP
> >> (1,
> >>>> 2,
> >>>>> 3 and 5).
> >>>>>
> >>>>> For the new name, I've chosen repartition.purge.interval.ms, as I
> felt
> >>>> it
> >>>>> struck a good balance between being self-descriptive and concise.
> >> Please
> >>>>> let me know if you'd prefer something else.
> >>>>>
> >>>>> On point 6: My PR has basic validation to ensure the value is
> positive,
> >>>> but
> >>>>> I don't think it's necessary to have dynamic validation, to ensure
> it's
> >>>> not
> >>>>> less than commit.interval.ms. The reason is that it will be
> implicitly
> >>>>> limited to that value anyway, and won't break anything. But I can add
> >> it
> >>>> if
> >>>>> you'd prefer it.
> >>>>>
> >>>>> On point 7: I worry that defaulting it to follow the value of
> >>>>> commit.interval.ms may confuse users, who will likely expect the
> >>>> default to
> >>>>> not be affected by changes to other configuration options. I can see
> >> the
> >>>>> appeal of retaining the existing behaviour (following the commit
> >>>> interval)
> >>>>> by default, but I believe that the majority of users who customize
> >>>>> commit.interval.ms do not desire a different frequency of
> repartition
> >>>>> record purging as well.
> >>>>>
> >>>>> As for multiples of commit interval: I think the argument against
> that
> >> is
> >>>>> that an interval is more intuitive when given as a time, rather than
> >> as a
> >>>>> multiple of some other operation. Users configuring this should not
> >> need
> >>>> to
> >>>>> break out a calculator to work out how frequently the records are
> going
> >>>> to
> >>>>> be purged!
> >>>>>
> >>>>> I've also updated the PR with the relevant changes.
> >>>>>
> >>>>> BTW, for some reason I didn't receive Sophie's email. I'll
> periodically
> >>>>> check the thread on the archive to ensure I don't miss any more of
> your
> >>>>> messages!
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Nick
> >>>>>
> >>>>> On Tue, 21 Dec 2021 at 12:34, Luke Chen <show...@gmail.com> wrote:
> >>>>>
> >>>>>> Thanks, Bruno.
> >>>>>>
> >>>>>> I agree my point 4 is more like PR discussion, not KIP discussion.
> >>>>>> @Nick , please update my point 4 in PR directly.
> >>>>>>
> >>>>>> Thank you.
> >>>>>> Luke
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Dec 21, 2021 at 7:24 PM Bruno Cadonna <cado...@apache.org>
> >>>> wrote:
> >>>>>>
> >>>>>>> Hi Nick,
> >>>>>>>
> >>>>>>> Thank you for the KIP!
> >>>>>>>
> >>>>>>> I agree on points 1, 2, and 3. I am not sure about point 4. I agree
> >>>> that
> >>>>>>> we should update the docs for commit.interval.ms but I am not sure
> >> if
> >>>>>>> this needs to mentioned in the KIP. That seems to me more a PR
> >>>>>>> discussion. Also on point 2, I agree that we need to add a doc
> string
> >>>>>>> but the content should be exemplary not binding. What I want to say
> >> is
> >>>>>>> that, we do not need a KIP to change docs.
> >>>>>>>
> >>>>>>> Here my points:
> >>>>>>>
> >>>>>>> 5. Could you specify in the motivation that the KIP is about
> deleting
> >>>>>>> records from repartition topics? Maybe with a short description
> when
> >>>> why
> >>>>>>> and when records are deleted from the repartition topics. For us it
> >>>>>>> might be clear, but IMO we should try to write KIPs so that someone
> >>>> that
> >>>>>>> is relatively new to Kafka Streams can understand the KIP without
> >>>>>>> needing to know too much background.
> >>>>>>>
> >>>>>>> 6. Does the config need to be validated? For example, does
> >>>>>>> delete.interval.ms need to be greater than or equal to
> >>>>>> commit.interval.ms?
> >>>>>>>
> >>>>>>> 7. Should the default value for non-EOS be 30s or the same value as
> >>>>>>> commit.interval.ms? I am just thinking about the case where a user
> >>>>>>> explicitly changes commit.interval.ms but not delete.interval.ms
> (or
> >>>>>>> whatever name you come up for it). Once delete.interval.ms is set
> >>>>>>> explicitly it is decoupled from commit.interval.ms. Similar could
> be
> >>>>>>> done for the EOS case.
> >>>>>>> Alternatively, we could also define delete.interval.ms to take a
> >>>>>>> integral number without a unit that specifies after how many commit
> >>>>>>> intervals the records in repartition topics should be deleted. This
> >>>>>>> would make sense since delete.interval.ms is tightly bound to
> >>>>>>> commit.interval.ms. Additionally, it would make the semantics of
> the
> >>>>>>> config simpler. The name of the config should definitely change if
> we
> >>>> go
> >>>>>>> down this way.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Bruno
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 21.12.21 11:14, Luke Chen wrote:
> >>>>>>>> Hi Nick,
> >>>>>>>>
> >>>>>>>> Thanks for the KIP!
> >>>>>>>>
> >>>>>>>> In addition to Sophie's comments, I have one more to this KIP:
> >>>>>>>> 3. I think you should mention the behavior change *explicitly* in
> >>>>>>>> "Compatibility" section. I know you already mentioned it in KIP,
> in
> >>>> the
> >>>>>>>> benefit way. But I think in this section, we should clearly point
> >> out
> >>>>>>> which
> >>>>>>>> behavior will be change after this KIP. That is, you should put it
> >>>>>> clear
> >>>>>>>> that the delete record interval will change from 100ms to 30s with
> >> EOS
> >>>>>>>> enabled. And it should also be mentioned in doc/upgrade.html doc.
> >>>>>>>> 4. Since this new config has some relationship with
> >>>> commit.interval.ms
> >>>>>> ,
> >>>>>>> I
> >>>>>>>> think we should also update the doc description for `
> >>>>>> commit.interval.ms
> >>>>>>> `,
> >>>>>>>> to let user know there's another config to control delete interval
> >> and
> >>>>>>>> should be greater than commit.interval.ms. Something like that.
> >> WDYT?
> >>>>>>> (You
> >>>>>>>> should put this change in the KIP as Sophie mentioned)
> >>>>>>>>
> >>>>>>>> Thank you.
> >>>>>>>> Luke
> >>>>>>>>
> >>>>>>>> On Tue, Dec 21, 2021 at 9:27 AM Sophie Blee-Goldman
> >>>>>>>> <sop...@confluent.io.invalid> wrote:
> >>>>>>>>
> >>>>>>>>> Hey Nick,
> >>>>>>>>>
> >>>>>>>>> I think you forgot to link to the KIP document, but I take it
> this
> >> is
> >>>>>>>>> it: KIP-811:
> >>>>>>>>> Add separate delete.interval.ms to Kafka Streams
> >>>>>>>>> <https://cwiki.apache.org/confluence/x/JY-kCw>
> >>>>>>>>>
> >>>>>>>>> The overall proposal sounds good to me, just a few minor things:
> >>>>>>>>>
> >>>>>>>>>        1. Please specify everything needed to define this config
> >>>>>>> explicitly, ie
> >>>>>>>>>        all the arguments that will be passed in to the
> >>>>>>>>>        StreamsConfig's ConfigDef: in addition to the default
> value,
> >> we
> >>>>>>> need the
> >>>>>>>>>        config type (presumably a Long), the doc
> >>>>>>>>>        string, and the importance (probably "low", similar to
> >>>>>>>>> commit.interval.ms
> >>>>>>>>>        )
> >>>>>>>>>        2. Might be worth considering a slightly more descriptive
> >> name
> >>>> for
> >>>>>>> this
> >>>>>>>>>        config. Most users probably don't think about,
> >>>>>>>>>        or may not even be aware of, the deletion of consumed
> >> records by
> >>>>>>> Kafka
> >>>>>>>>>        Streams, so calling it something along
> >>>>>>>>>        the lines of "repartition.records.delete.interval.ms" or
> "
> >>>>>>>>>        consumed.records.deletion.interval.ms" or so on will help
> >>>>>>>>>        make it clear what the config refers to and whether or not
> >> they
> >>>>>>> need to
> >>>>>>>>>        care. Maybe you can come up with better
> >>>>>>>>>        and/or shorter names, just wanted to suggest some example
> >> names
> >>>>>>> that I
> >>>>>>>>>        think sufficiently get the point across
> >>>>>>>>>
> >>>>>>>>> Other than that I'm +1 -- thanks for the KIP!
> >>>>>>>>>
> >>>>>>>>> Sophie
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Dec 20, 2021 at 9:15 AM Nick Telford <
> >> nick.telf...@gmail.com
> >>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> This is a KIP for a proposed solution to KAFKA-13549
> >>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-13549>. The
> solution
> >>>> is
> >>>>>>>>> very
> >>>>>>>>>> simple, so the KIP is pretty short.
> >>>>>>>>>>
> >>>>>>>>>> The suggested changes are implemented by this PR
> >>>>>>>>>> <https://github.com/apache/kafka/pull/11610>.
> >>>>>>>>>> --
> >>>>>>>>>> Nick Telford
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Reply via email to