Awesome. Thanks Tom!

I plan to open this KIP vote at the start of next week.
Thanks all for the discussion! Let me know if there is anything else. :)

Justine

On Wed, Aug 3, 2022 at 11:32 AM Tom Bentley <tbent...@redhat.com> wrote:

> Hi Justine,
>
> That all seems reasonable to me, thanks!
>
> On Wed, 3 Aug 2022 at 19:14, Justine Olshan <jols...@confluent.io.invalid>
> wrote:
>
> > Hi Tom and Ismael,
> >
> > 1. Yes, there are definitely many ways to improve this issue and I plan
> to
> > write followup KIPs to address some of the larger changes.
> > Just wanted to get this simple fix in as a short term measure to prevent
> > issues with too many producer IDs in the cache. Stay tuned :)
> >
> > 2. I did have some offline discussion about informing the client. I think
> > for this specific KIP the default behavior in practice should not change
> > enough to require this information to go back to the client. In other
> > words, a reasonable configuration should not regress behavior. However,
> > with the further changes I mention in 1, perhaps this is something we
> want
> > to do. And yes -- unfortunately the current state of Kafka is no longer
> > totally consistent with KIP-98. This is something we probably want to
> > clarify in the future.
> >
> > 3. I will update the config to mention it is not dynamic. I think since
> the
> > transactional id configuration is read-only, this should be too.
> >
> > 4. I can update this wording.
> >
> > 5. I think there are definitely benefits to the name `
> > idempotent.pid.expiration.ms` but there are other ways this could cause
> > confusion. And to be clear -- the configuration can expire a producer ID
> > for a transactional producer as long as there isn't an ongoing
> transaction.
> >
> > Let me know if you have any questions and thanks for taking a look!
> >
> > Justine
> >
> > On Wed, Aug 3, 2022 at 9:30 AM Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > Regarding 1, more can certainly be done, but I think it would be
> > > complementary. As such, I think this KIP stands on its own and
> additional
> > > improvements can be handled via future KIPs (unless Justine wants to
> > > combine things, of course).
> > >
> > > Ismael
> > >
> > > On Wed, Aug 3, 2022 at 9:12 AM Tom Bentley <tbent...@redhat.com>
> wrote:
> > >
> > > > Hi Justine,
> > > >
> > > > Thanks for the KIP! I can see that this is a pragmatic attempt to
> > > address a
> > > > nasty problem. I have a few questions:
> > > >
> > > > 1. The KIP makes the problem significantly harder to trigger, but
> > doesn't
> > > > eliminate it entirely. How confident are you that it will be
> sufficient
> > > in
> > > > practice? We can point to applications which are creating idempotent
> > > > producers at a high rate and say they're broken, but that doesn't do
> > > > anything to defend the broker from an interaction pattern that
> differs
> > > only
> > > > in rate from a "good application". Did you consider a new quota to
> > limit
> > > > the rate at which a (principal, clientId) can allocate new PIDs?
> > > >
> > > > 2. The KIP contains this sentence: "when an idempotent producer’s ID
> > > > expires, it silently loses its idempotency guarantees." That's at
> odds
> > > with
> > > > my reading of "PID expiration" in the KIP-98 design[1], but it does
> > seem
> > > > consistent with a (brief!) look at the code. I accept that the risk
> > > should
> > > > be minimal so long as the expiration time is > the producer's
> delivery
> > > > timeout, but it would still be nice if we could detect this situation
> > and
> > > > return an error to the client. Is there a reason for the apparent
> > > deviation
> > > > from KIP-98 (or am I misreading the code?)
> > > >
> > > > 3. Could the KIP be explicit on whether the new config will be
> > > dynamically
> > > > changeable?
> > > >
> > > > 4. The description of producer.id.expiration.ms mentions the
> > > > ProducerStateManager, which will mean nothing to a normal user. We
> > could
> > > > probably change it to "a topic partition leader" without loss of
> > meaning.
> > > >
> > > > 5. The description also says "Producer IDs will not expire while a
> > > > transaction associated to them is still ongoing." To me this suggests
> > > that
> > > > a more intuitive name for this config (from the user PoV) would
> include
> > > > "idempotent", since it does not cover the transactional case. (I
> would
> > > > suggest "idempotent.pid.expiration.ms" (c.f.
> > > > transactional.id.expiration.ms),
> > > > but the distinction between "id" and "pid" is easily missed–even if
> > it's
> > > > technically correct–so I'm not sure it's better than what you're
> > > > proposing). I only mention it in case it prompts someone else to
> find a
> > > > better name.
> > > >
> > > > Thanks again,
> > > >
> > > > Tom
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/11Jqy_GjUGtdXJK94XGsEIK7CP1SnQGdp2eF0wSw9ra8/edit#heading=h.loujdamc9ptj
> > > >
> > > > On Tue, 2 Aug 2022 at 22:00, Justine Olshan
> > <jols...@confluent.io.invalid
> > > >
> > > > wrote:
> > > >
> > > > > I've updated the KIP to make the new minimum value 1 and remove the
> > -1
> > > > > configuration.
> > > > > I've also added the low priority to the configuration and edited
> the
> > > > > description as Ismael mentioned.
> > > > >
> > > > > I'm thinking about bringing this KIP to a vote soon! Let me know if
> > > there
> > > > > are any other comments or questions.
> > > > >
> > > > > Thanks,
> > > > > Justine
> > > > >
> > > > > On Tue, Aug 2, 2022 at 9:02 AM Jason Gustafson
> > > > <ja...@confluent.io.invalid
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > I agree with Ismael that we should remove -1. I think we tend to
> > view
> > > > the
> > > > > > coupling of these behaviors into a single configuration as a
> > mistake,
> > > > so
> > > > > > it's a little odd to keep it (even if in a weakened form).
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > > > On Sat, Jul 30, 2022 at 7:37 AM Ismael Juma <ism...@juma.me.uk>
> > > wrote:
> > > > > >
> > > > > > > I would remove -1 altogether. Two more comments:
> > > > > > >
> > > > > > > 1. The current description kind of leads people towards
> aligning
> > > the
> > > > > > config
> > > > > > > with delivery.timeout.ms. Is that what we want? We could say
> it
> > > > should
> > > > > > be
> > > > > > > higher than delivery.timeout.ms but indicate that the default
> is
> > > > > usually
> > > > > > > fine. The main reason to reduce it would be to save memory, I
> > > guess.
> > > > > > > 2. Each config has a priority, we should specify it for this
> one.
> > > I'm
> > > > > > > assuming it will be "low".
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Fri, Jul 29, 2022 at 2:38 PM Justine Olshan
> > > > > > > <jols...@confluent.io.invalid>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > I've updated the KIP to include the new default of 1 day and
> > > > > > information
> > > > > > > > about -1 in the description of the config.
> > > > > > > > I wonder though if including -1 makes sense now that it is
> not
> > > the
> > > > > > > default
> > > > > > > > value. Is there a benefit for manually setting -1 vs manually
> > > > setting
> > > > > > the
> > > > > > > > value that transactional.id.expiration.ms has?
> > > > > > > >
> > > > > > > > Let me know your thoughts.
> > > > > > > > Thanks,
> > > > > > > > Justine
> > > > > > > >
> > > > > > > > On Fri, Jul 29, 2022 at 10:54 AM Ismael Juma <
> > ism...@juma.me.uk>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 for having 1 day as the default and for including this
> > > change
> > > > in
> > > > > > the
> > > > > > > > > release notes.
> > > > > > > > >
> > > > > > > > > Ismael
> > > > > > > > >
> > > > > > > > > On Wed, Jul 27, 2022 at 9:16 AM Jason Gustafson
> > > > > > > > <ja...@confluent.io.invalid
> > > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I don't think a major release is a requirement for a
> > default
> > > > > change
> > > > > > > for
> > > > > > > > > > what it's worth. I do appreciate that there is a
> preference
> > > for
> > > > > not
> > > > > > > > > rocking
> > > > > > > > > > the boat though. For a little bit of background here, the
> > > > problem
> > > > > > we
> > > > > > > > > > have encountered in production since the idempotent
> > producer
> > > > > became
> > > > > > > the
> > > > > > > > > > default is OOM errors due to huge numbers of producerIds
> > that
> > > > had
> > > > > > to
> > > > > > > be
> > > > > > > > > > retained in the cache for 7 days. It is hard to prevent
> use
> > > > cases
> > > > > > > from
> > > > > > > > > > emerging where producers are used and discarded rapidly.
> We
> > > > will
> > > > > be
> > > > > > > > > using a
> > > > > > > > > > lower value for sure, but it would also be nice to reduce
> > the
> > > > > > > > likelihood
> > > > > > > > > > for the community to see this problem. The benefit of the
> > > > caching
> > > > > > > > > > diminishes quickly over time since it is primarily meant
> to
> > > > > handle
> > > > > > > > > producer
> > > > > > > > > > retry windows. I do not think there is much difference
> > > between
> > > > 1
> > > > > > days
> > > > > > > > > and 7
> > > > > > > > > > days from an application perspective, but it is a huge
> > > > difference
> > > > > > for
> > > > > > > > the
> > > > > > > > > > broker's memory usage.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Jason
> > > > > > > > > >
> > > > > > > > > > On Wed, Jul 27, 2022 at 2:57 AM Sagar <
> > > > sagarmeansoc...@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks Justine for the KIP. I think it might be better
> to
> > > > > > document
> > > > > > > > the
> > > > > > > > > > > correlation between the new config and
> > delivery.timeout.ms
> > > > in
> > > > > > the
> > > > > > > > > Public
> > > > > > > > > > > Interfaces Description.
> > > > > > > > > > >
> > > > > > > > > > > Also, I agree with Luke that for now setting a default
> to
> > > -1
> > > > > > should
> > > > > > > > be
> > > > > > > > > > > good. We can look to switch to 1 day with major
> release.
> > > > > > > > > > >
> > > > > > > > > > > Thanks!
> > > > > > > > > > > Sagar.
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jul 27, 2022 at 9:05 AM Luke Chen <
> > > show...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the KIP.
> > > > > > > > > > > > I agree with you that we should try our best to keep
> > > > backward
> > > > > > > > > > > > compatibility, although our intention is to have
> lower
> > > > > producer
> > > > > > > id
> > > > > > > > > > > > expiration timeout.
> > > > > > > > > > > > So, I think we should keep default to -1 IMO.
> > > > > > > > > > > > Maybe we change the default to 1 day in next major
> > > release
> > > > > > (4.0)?
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you.
> > > > > > > > > > > > Luke
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jul 27, 2022 at 7:13 AM Justine Olshan
> > > > > > > > > > > > <jols...@confluent.io.invalid>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for taking a look Jason!
> > > > > > > > > > > > >
> > > > > > > > > > > > > I wondered if we wanted to have a smaller default
> but
> > > > > wasn't
> > > > > > > sure
> > > > > > > > > > about
> > > > > > > > > > > > the
> > > > > > > > > > > > > compatibility story -- especially since there is
> the
> > > > chance
> > > > > > for
> > > > > > > > > > > producer
> > > > > > > > > > > > > IDs to expire silently.
> > > > > > > > > > > > > I do think that 1 day is fairly reasonable. If I
> > don't
> > > > hear
> > > > > > any
> > > > > > > > > > > > conflicting
> > > > > > > > > > > > > opinions I can go ahead and update the default.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Justine
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Jul 26, 2022 at 12:23 PM Jason Gustafson
> > > > > > > > > > > > > <ja...@confluent.io.invalid>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the KIP. Although I hate seeing new
> > > > > > > configurations,
> > > > > > > > I
> > > > > > > > > > > think
> > > > > > > > > > > > > this
> > > > > > > > > > > > > > is a good change. Combining these timeout
> behaviors
> > > > into
> > > > > a
> > > > > > > > single
> > > > > > > > > > > > > > configuration was definitely a mistake, but we
> > didn't
> > > > > > > > anticipate
> > > > > > > > > > the
> > > > > > > > > > > > > > problem with the producer id cache. I do wonder
> if
> > we
> > > > can
> > > > > > > make
> > > > > > > > > the
> > > > > > > > > > > > > default
> > > > > > > > > > > > > > a bit lower to reduce the chances that users will
> > hit
> > > > the
> > > > > > > same
> > > > > > > > > > memory
> > > > > > > > > > > > > > issues we have seen. After decoupling this
> > > > configuration
> > > > > > from
> > > > > > > > > > > > > > transactional.id.expiration.ms, the new timeout
> > just
> > > > > needs
> > > > > > > to
> > > > > > > > > > cover
> > > > > > > > > > > > the
> > > > > > > > > > > > > > longest duration that a producer might be
> retrying
> > > the
> > > > > same
> > > > > > > > > Produce
> > > > > > > > > > > > > > request. 7 days seems too high. Although I think
> it
> > > > could
> > > > > > go
> > > > > > > a
> > > > > > > > > fair
> > > > > > > > > > > > even
> > > > > > > > > > > > > > lower, perhaps 1 day is a reasonable place to
> > start?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Jason
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:25 AM Justine Olshan
> > > > > > > > > > > > > > <jols...@confluent.io.invalid>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hey Bill,
> > > > > > > > > > > > > > > Thanks! I was just going to say that hopefully
> > > > > > > > > > > > > > > transactional.id.expiration.ms would also be
> > over
> > > > the
> > > > > > > > delivery
> > > > > > > > > > > > > timeout.
> > > > > > > > > > > > > > :)
> > > > > > > > > > > > > > > Thanks for the +1!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Justine
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Jul 25, 2022 at 9:17 AM Bill Bejeck <
> > > > > > > > bbej...@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I just took another look at the KIP, and I
> > > realize
> > > > my
> > > > > > > > > > > > > > question/suggestion
> > > > > > > > > > > > > > > > about default values has already been
> addressed
> > > in
> > > > > the
> > > > > > > > > > > > > `Compatibility`
> > > > > > > > > > > > > > > > section.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I'm +1 on the KIP.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -Bill
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 6:20 PM Bill Bejeck <
> > > > > > > > > bbej...@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Justine,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the well written KIP, this looks
> > > like
> > > > it
> > > > > > > will
> > > > > > > > > be a
> > > > > > > > > > > > > useful
> > > > > > > > > > > > > > > > > addition.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Overall the KIP looks good to me, I have
> one
> > > > > > > > > > question/comment.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > You mentioned that setting the `
> > > > > > > > producer.id.expiration.ms`
> > > > > > > > > > > less
> > > > > > > > > > > > > than
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > delivery timeout could lead to duplicates,
> > > which
> > > > > > makes
> > > > > > > > > sense.
> > > > > > > > > > > To
> > > > > > > > > > > > > > help
> > > > > > > > > > > > > > > > > avoid this situation, do we want to
> consider
> > a
> > > > > > default
> > > > > > > > > value
> > > > > > > > > > > that
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > same as the delivery timeout?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks again for the KIP.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Bill
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Thu, Jul 21, 2022 at 4:54 PM Justine
> > Olshan
> > > > > > > > > > > > > > > > > <jols...@confluent.io.invalid> wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >> Hey all!
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> I'd like to start a discussion on my
> > proposal
> > > to
> > > > > > > > separate
> > > > > > > > > > > > > time-based
> > > > > > > > > > > > > > > > >> producer ID expiration from transactional
> ID
> > > > > > > expiration
> > > > > > > > by
> > > > > > > > > > > > > > > introducing a
> > > > > > > > > > > > > > > > >> new configuration.
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> The KIP Is pretty small and simple, but
> will
> > > be
> > > > > > > helpful
> > > > > > > > in
> > > > > > > > > > > > > > controlling
> > > > > > > > > > > > > > > > >> memory usage in brokers -- especially now
> > that
> > > > by
> > > > > > > > default
> > > > > > > > > > > > > producers
> > > > > > > > > > > > > > > are
> > > > > > > > > > > > > > > > >> idempotent and create producer ID state.
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> Please take a look and leave any comments
> > you
> > > > may
> > > > > > > have!
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> KIP:
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-854+Separate+configuration+for+producer+ID+expiry
> > > > > > > > > > > > > > > > >> JIRA:
> > > > > > > https://issues.apache.org/jira/browse/KAFKA-14097
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> Thanks!
> > > > > > > > > > > > > > > > >> Justine
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to