Hi PoAn, > 1. KAFKA-18905 or KAFKA-9199 are about leader changes cause OUT_OF_ORDER_SEQUENCE error. This KIP is to remove NUM_BATCHES_TO_RETAIN limitation. I think they’re not related.
OK, I see. > Yes, if max.in.flight.requests.per.connection is larger than NUM_BATCHES_TO_RETAIN, the batches cannot be retained. That is why we have initial state to make sure the producer sends in flight requests less or equal to NUM_BATCHES_TO_RETAIN. Only if it finds a broker can retain more batches, it adjusts its limitation. So, currently, when idempotent/transactional producer is enabled, we will throw exception if the max.in.flight.requests.per.connection > 5. When we allow users to configure the NUM_BATCHES_TO_RETAIN, the validation will not be applied before sending the produce request. And that's why we need the produce response to tell the producer what the setting in the broker side is. Could you make it more clear about this in the KIP? Also, if the max.in.flight.requests.per.connection is set to 100, and NUM_BATCHES_TO_RETAIN is 5, then it means it's a little late when the first producer response is received if we already allow producers to send 100 requests in flight. If we want to adopt this solution, maybe we need to let the producer begins from max.in.flight.requests.per.connection = 1 and then adjust it to the expected value after the first producer response is received. Does that make sense? > 4. We can adjust the default NUM_BATCHES_TO_RETAIN. However, if a broker works with old producers, it may waste memory. Old producers can't send more in flight requests cause of ConfigException. How about we still use 5 in 4.x and adjust to a larger value in 5.0? Sounds good to me. Thank you, Luke On Thu, Feb 26, 2026 at 9:22 PM PoAn Yang <[email protected]> wrote: > Hi Luke, > > Thanks for the review and suggestions. > > 1. KAFKA-18905 or KAFKA-9199 are about leader changes cause > OUT_OF_ORDER_SEQUENCE error. This KIP is to remove > NUM_BATCHES_TO_RETAIN limitation. I think they’re not related. > > 2. Agree, transactional producers are based on idempotent producers. > Updated it. > > 3. > > So, I'd like to know why we have to adjust the > > `max.in.flight.requests.per.connection` value in the producer side? > > > User doesn’t need to update max.in.flight.requests.per.connection in > this case. The producer will automatically adjust internal limitation of > in flight requests. > > > Using the example above, after this KIP, > > the `max.in.flight.requests.per.connection=10` cannot be retained > > unless NUM_BATCHES_TO_RETAIN is set to 10, right? > > > Yes, if max.in.flight.requests.per.connection is larger than > NUM_BATCHES_TO_RETAIN, the batches cannot be retained. > That is why we have initial state to make sure the producer sends > in flight requests less or equal to NUM_BATCHES_TO_RETAIN. > Only if it finds a broker can retain more batches, it adjusts its > limitation. > > 4. We can adjust the default NUM_BATCHES_TO_RETAIN. However, > if a broker works with old producers, it may waste memory. Old > producers can't send more in flight requests cause of ConfigException. > How about we still use 5 in 4.x and adjust to a larger value in 5.0? > > Thank you, > PoAn > > > On Feb 25, 2026, at 9:07 PM, Luke Chen <[email protected]> wrote: > > > > Hi PoAn, > > > > Thanks for the KIP! > > I agree the number of batches to retain should be configurable to improve > > the throughput. > > > > Comments: > > 1. Could you add the issue: KAFKA-18905 > > <https://issues.apache.org/jira/browse/KAFKA-18905> into the > > motivation section? I think this is the issue we want to address, right? > > > > 2. > Introduce a new config on the broker, as the broker must know how > much > > memory to allocate. Operators can set a limitation on the broker side to > > prevent malicious producers. This configuration only takes effect for > > idempotent producers. > > I think not only the idempotent producers, but also the > > transactional producers, as long as they have the PID. > > > > 3. About the producer response update, I'm wondering if it is necessary? > > Currently, when producer with `max.in.flight.requests.per.connection=10` > > and NUM_BATCHES_TO_RETAIN=5, we won't adjust the producer config to 5. > > Of course it is possible to the duplication cannot be detected, but that > > might be user's choice to improve the throughput (though it might be > rare). > > So, I'd like to know why we have to adjust the > > `max.in.flight.requests.per.connection` value in the producer side? > > Using the example above, after this KIP, > > the `max.in.flight.requests.per.connection=10` cannot be retained > > unless NUM_BATCHES_TO_RETAIN is set to 10, right? > > > > 4. The default value of `max.idempotence.batches.to.retain` > > In the performance test you showed, it obviously shows > > larger `max.idempotence.batches.to.retain` will get better throughput. > > Also, the memory usage is small, do we have any reason we keep the > default > > value for 5? > > > > Thank you, > > Luke > > > > > > > > On Sun, Feb 22, 2026 at 9:48 PM PoAn Yang <[email protected]> wrote: > > > >> Hi all, > >> > >> I would like to start a discussion thread on KIP-1269. In this KIP, we > aim > >> to remove limitation of maximal number of batches to retain for a > >> idempotent producer. In our test, it can improve throughput and reduce > >> latency. > >> > >> https://cwiki.apache.org/confluence/x/loI8G > >> > >> Please take a look and feel free to share any thoughts. > >> > >> Thanks. > >> PoAn > >
