Re: [DISCUSS] KIP-1222: Acquisition lock timeout renewal in share consumer explicit mode

Sushant Mahajan Sat, 18 Oct 2025 05:46:09 -0700

Hi Sanskar,
Thanks for the feedback. I have added and additional section to proposed
changes to highlight the issue and some debugging actions.


Regards,
Sushant Mahajan

On Wed, 15 Oct 2025, 01:53 Lianet M., <[email protected]> wrote:

> Hi Sushant, thanks for the KIP.
>
> LM1: About the validation for brokers that may not support this
> feature, the KIP proposes to validate on the "acknowledge" call, but
> that does not build any request or have any relationship with the
> NetworkClient (where the ApiVersions info lives), so how exactly are
> we imagining we would have the node version to check on acknowledge?
> It's usually when we build the request that we have the info handy, to
> check if the Node version is not what the client needs for a feature
> it is using. So here it would mean that the error would be discovered
> on the call to poll or commit following the acknowledge I expect. I do
> agree that it makes more sense to throw it on ack, but just pointing
> out the challenge to make sure we thought about it.
>
> LM2: About the IllegalArgumentException when using an ack type not
> supported by the broker, I wonder if it should be
> UnsupportedVersionException? I think this is what we consistently
> throw when the client attempts to use a feature not supported by the
> broker version. It's actually a common pattern on the client (related
> to both comments LM1 and LM2): check node version when building the
> request and throwing UnsupportedVersionException depending on a
> feature being used:  ex.
>
> https://github.com/apache/kafka/blob/4cb5f1ca124b810b32fcef1ca23f00faf70603b6/clients/src/main/java/org/apache/kafka/common/requests/ListGroupsRequest.java#L52
>
> LM3: About the client-side buffering proposed to be able to return the
> renewed records on poll, I wonder if we should be concerned about
> potential memory issues? I expect that an app would easily end up
> calling RENEW continuously (common if it's failing to process every
> record for a period of time?). In that case, how would that buffer be
> bound? The KIP mentions that it would keep records "until they are
> re-delivered by the broker", so does it mean that the buffer would
> continue growing until the client app is able to ACCEPT or REJECT
> records?
>
> LM4: On the new client-side buffer too, did we consider the
> alternative of no extra buffering on the client, but just have the
> broker re-send the renewed records to the member that has them? I'm
> probably missing some broker-side implications of the share fetch, but
> curious to understand the trade-offs (looking at it from the client
> side, my first reaction is to avoid adding another buffering on the
> client if possible, due to the complexities and implications, like
> memory issues).
>
> Thanks!
> Lianet
>
> On Tue, Oct 14, 2025 at 6:14 AM Sushant Mahajan <[email protected]>
> wrote:
> >
> > Hi Andrew,
> > I have modified the kip to throw IllegalArgumentException on unsupported
> > ack type.
> >
> > Regards,
> > Sushant Mahajan
> >
> > On Mon, 13 Oct 2025, 18:35 Sanskar Jhajharia,
> > <[email protected]> wrote:
> >
> > > Hey Sushant,
> > > Thanks for the KIP. Overall, the KIP looks like a good enhancement to
> the
> > > Share Consumer interface and helps unlock a significant use case. A
> couple
> > > of questions or suggestions:
> > >
> > > SJ1: If a record is a "poison pill" (a record that consistently causes
> a
> > > consumer to require a very long processing timeout), a malicious or
> buggy
> > > consumer could be programmed to continuously renew the lock to prevent
> it
> > > from ever reaching a failure state on the consumer and being moved to a
> > > REJECTED or ARCHIVED state in the broker. Unlimited renewals can
> create a
> > > self-inflicted, persistent Head-of-Line (HOL) blocking issue for the
> entire
> > > share consumer group. Are there broker-side limits (e.g., a maximum
> number
> > > of renewals or a maximum cumulative lock duration) that can be
> configured
> > > to force a difficult record to eventually transition?
> > >
> > > SJ2: This is somehow linked to the comment `SJ1`. Would it make sense
> to
> > > improve visibility into lock renewal activity for better debugging and
> > > operational awareness? For example, exposing broker-level metrics or
> > > diagnostics that help operators identify cases of frequent or
> ineffective
> > > renewals could be valuable in understanding consumer or record-level
> > > issues. These metrics may be essential for operators to debug
> long-running
> > > jobs, identify bottlenecks, and properly configure consumer timeouts.
> > >
> > > Regards,
> > > Sanskar
> > >
> > > On Tue, Oct 7, 2025 at 3:46 PM Sushant Mahajan <[email protected]>
> > > wrote:
> > >
> > > > I’d like to start the discussion for KIP-1222: Acquisition lock
> timeout
> > > > renewal in share consumer explicit mode.
> > > >
> > > > JIRA: https://issues.apache.org/jira/browse/KAFKA-19742
> > > > KIP Wiki:
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1222%3A+Acquisition+lock+timeout+renewal+in+share+consumer+explicit+mode
> > > >
> > > > Regards,
> > > > Sushant Mahajan
> > > >
> > >
>

Re: [DISCUSS] KIP-1222: Acquisition lock timeout renewal in share consumer explicit mode

Reply via email to