Re: [DISCUSS] KIP-1038: Add Custom Error Handler to Producer

Federico Valeri Wed, 15 May 2024 00:31:20 -0700

Hello Alieh, thanks for this useful KIP.

There is a typo in the motivation when you talk about the
UnknownTopicOrPartitionException. It's delivery.timeout.ms, not
deliver.timeout.ms.


In the past, I did some work to improve and clean the official Kafka
examples, which I think are useful for new Kafka users. I was
wondering if it is worth to improve them in order to show the correct
usage of this new interface. If you agree, maybe we could mention this
in the proposed changes.

> The accepted responses for RecordTooLargeException are FAIL and SWALLOW. 
> Therefore, RETRY will be interpreted and executed as FAIL.

Why do we need this javadoc note? I think it's not possible to return
RETRY in the current form.

When we talk about swallowing in the default implementation, I think
we will log an error/warning and drop the record right? If yes, should
we clarify this and improve the DROP_INVALID_LARGE_RECORDS_DOC by
mentioning the logging part?

Should we mention somewhere which logic takes precedence when both the
interface and configs are used?

On Tue, May 14, 2024 at 4:45 PM Chris Egerton <chr...@aiven.io.invalid> wrote:
>
> Hi Alieh,
>
> Thank you for all the updates! One final question--how will the retry
> timeout for unknown topic partition errors be implemented? I think it would
> be best if this could be done with an implementation of the error handler,
> but I don't see a way to track the necessary information with the
> current ProducerExceptionHandler interface.
>
> Cheers,
>
> Chris
>
> On Tue, May 14, 2024 at 9:10 AM Alieh Saeedi <asae...@confluent.io.invalid>
> wrote:
>
> > Thanks Andrew. Done :)
> >
> > @Chris: I changed the config parameter type from boolean to integer, which
> > defines the timeout for retrying. I thought reusing `max.block.ms` was not
> > reasonable as you mentioned.
> >
> > So if the KIP looks good, let 's skip to the good part ;-) VOTING :)
> >
> > Bests,
> > Alieh
> >
> >
> >
> >
> >
> > On Tue, May 14, 2024 at 12:26 PM Andrew Schofield <
> > andrew_schofi...@live.com>
> > wrote:
> >
> > > Hi Alieh,
> > > Just one final comment.
> > >
> > > [AJS5] Existing classes use Retriable, not Retryable. For example:
> > >
> > >
> > https://kafka.apache.org/21/javadoc/org/apache/kafka/common/errors/RetriableException.html
> > >
> > > I suggest RetriableResponse and NonRetriableResponse.
> > >
> > > Thanks,
> > > Andrew
> > >
> > > > On 13 May 2024, at 23:17, Alieh Saeedi <asae...@confluent.io.INVALID>
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > >
> > > > Thanks for all the valid points you listed.
> > > >
> > > >
> > > > KIP updates and addressing concerns:
> > > >
> > > >
> > > > 1) The KIP now suggests two Response types: `RetryableResponse` and
> > > > `NonRetryableResponse`
> > > >
> > > >
> > > > 2) `custom.exception.handler` is changed to
> > > `custom.exception.handler.class`
> > > >
> > > >
> > > > 3) The KIP clarifies that `In the case of an implemented handler for
> > the
> > > > specified exception, the handler takes precedence.`
> > > >
> > > >
> > > > 4)  There is now a `default` implementation for both handle() methods.
> > > >
> > > >
> > > > 5)  @Chris: for `UnknownTopicOrPartition`, the default is already
> > > retrying
> > > > for 60s. (In fact, the default value of `max.block.ms`). If the
> > handler
> > > > instructs to FAIL or SWALLOW, there will be no retry, and if the
> > handler
> > > > instructs to RETRY, that will be the default behavior, which follows
> > the
> > > > values in already existing config parameters such as `max.block.ms`.
> > > Does
> > > > that make sense?
> > > >
> > > >
> > > > Hope the changes and explanations are convincing :)
> > > >
> > > >
> > > > Cheers,
> > > >
> > > > Alieh
> > > >
> > > > On Mon, May 13, 2024 at 6:40 PM Justine Olshan
> > > <jols...@confluent.io.invalid>
> > > > wrote:
> > > >
> > > >> Oh I see. The type isn't the error type but a newly defined type for
> > the
> > > >> response. Makes sense and works for me.
> > > >>
> > > >> Justine
> > > >>
> > > >> On Mon, May 13, 2024 at 9:13 AM Chris Egerton <
> > fearthecel...@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> If we have dedicated methods for each kind of exception
> > > >>> (handleRecordTooLarge, handleUnknownTopicOrPartition, etc.), doesn't
> > > that
> > > >>> provide sufficient constraint? I'm not suggesting we eliminate these
> > > >>> methods, just that we change their return types to something more
> > > >> flexible.
> > > >>>
> > > >>> On Mon, May 13, 2024, 12:07 Justine Olshan
> > > <jols...@confluent.io.invalid
> > > >>>
> > > >>> wrote:
> > > >>>
> > > >>>> I'm not sure I agree with the Retriable and NonRetriableResponse
> > > >> comment.
> > > >>>> This doesn't limit the blast radius or enforce certain errors are
> > > used.
> > > >>>> I think we might disagree on how controlled these interfaces can
> > be...
> > > >>>>
> > > >>>> Justine
> > > >>>>
> > > >>>> On Mon, May 13, 2024 at 8:40 AM Chris Egerton
> > <chr...@aiven.io.invalid
> > > >>>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Hi Alieh,
> > > >>>>>
> > > >>>>> Thanks for the updates! I just have a few more thoughts:
> > > >>>>>
> > > >>>>> - I don't think a boolean property is sufficient to dictate retries
> > > >> for
> > > >>>>> unknown topic partitions, though. These errors can occur if a topic
> > > >> has
> > > >>>>> just been created, which can occur if, for example, automatic topic
> > > >>>>> creation is enabled for a multi-task connector. This is why I
> > > >> proposed
> > > >>> a
> > > >>>>> timeout instead of a boolean (and see my previous email for why
> > > >>> reducing
> > > >>>>> max.block.ms for a producer is not a viable alternative). If it
> > > >> helps,
> > > >>>> one
> > > >>>>> way to reproduce this yourself is to add the line
> > > >>>>> `fooProps.put(TASKS_MAX_CONFIG, "10");` to the integration test
> > here:
> > > >>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > https://github.com/apache/kafka/blob/5439914c32fa00d634efa7219699f1bc21add839/connect/runtime/src/test/java/org/apache/kafka/connect/integration/SourceConnectorsIntegrationTest.java#L134
> > > >>>>> and then check the logs afterward for messages like "Error while
> > > >>> fetching
> > > >>>>> metadata with correlation id <n> :
> > > >>>> {foo-topic=UNKNOWN_TOPIC_OR_PARTITION}".
> > > >>>>>
> > > >>>>> - I also don't think we need custom XxxResponse enums for every
> > > >>> possible
> > > >>>>> method; it seems like this will lead to a lot of duplication and
> > > >>>> cognitive
> > > >>>>> overhead if we want to expand the error handler in the future.
> > > >>> Something
> > > >>>>> more flexible like RetriableResponse and NonRetriableResponse could
> > > >>>>> suffice.
> > > >>>>>
> > > >>>>> - Finally, the KIP still doesn't state how the handler will or
> > won't
> > > >>> take
> > > >>>>> precedence over existing retry properties. If I set `retries` or `
> > > >>>>> delivery.timeout.ms` or `max.block.ms` to low values, will that
> > > >> cause
> > > >>>>> retries to cease even if my custom handler would otherwise keep
> > > >>> returning
> > > >>>>> RETRY for an error?
> > > >>>>>
> > > >>>>> Cheers,
> > > >>>>>
> > > >>>>> Chris
> > > >>>>>
> > > >>>>> On Mon, May 13, 2024 at 11:02 AM Andrew Schofield <
> > > >>>>> andrew_schofi...@live.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi Alieh,
> > > >>>>>> Just a few more comments on the KIP. It is looking much less risky
> > > >>> now
> > > >>>>> the
> > > >>>>>> scope
> > > >>>>>> is tighter.
> > > >>>>>>
> > > >>>>>> [AJS1] It would be nice to have default implementations of the
> > > >> handle
> > > >>>>>> methods
> > > >>>>>> so an implementor would not need to implement both themselves.
> > > >>>>>>
> > > >>>>>> [AJS2] Producer configurations which are class names usually end
> > in
> > > >>>>>> “.class”.
> > > >>>>>> I suggest “custom.exception.handler.class”.
> > > >>>>>>
> > > >>>>>> [AJS3] If I implemented a handler, and I set a non-default value
> > > >> for
> > > >>>> one
> > > >>>>>> of the
> > > >>>>>> new configuations, what happens? I would expect that the handler
> > > >>> takes
> > > >>>>>> precedence. I wasn’t quite clear what “the control will follow the
> > > >>>>> handler
> > > >>>>>> instructions” meant.
> > > >>>>>>
> > > >>>>>> [AJS4] Because you now have an enum for the
> > > >>>>>> RecordTooLargeExceptionResponse,
> > > >>>>>> I don’t think you need to state in the comment for
> > > >>>>>> ProducerExceptionHandler that
> > > >>>>>> RETRY will be interpreted as FAIL.
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Andrew
> > > >>>>>>
> > > >>>>>>> On 13 May 2024, at 14:53, Alieh Saeedi
> > > >>> <asae...@confluent.io.INVALID
> > > >>>>>
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>> Hi all,
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Thanks for the very interesting discussion during my PTO.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> KIP updates and addressing concerns:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> 1) Two handle() methods are defined in ProducerExceptionHandler
> > > >> for
> > > >>>> the
> > > >>>>>> two
> > > >>>>>>> exceptions with different input parameters so that we have
> > > >>>>>>> handle(RecordTooLargeException e, ProducerRecord record) and
> > > >>>>>>> handle(UnknownTopicOrPartitionException e, ProducerRecord record)
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> 2) The ProducerExceptionHandler extends `Closable` as well.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> 3) The KIP suggests having two more configuration parameters with
> > > >>>>> boolean
> > > >>>>>>> values:
> > > >>>>>>>
> > > >>>>>>> - `drop.invalid.large.records` with a default value of `false`
> > > >> for
> > > >>>>>>> swallowing too large records.
> > > >>>>>>>
> > > >>>>>>> - `retry.unknown.topic.partition` with a default value of `true`
> > > >>> that
> > > >>>>>>> performs RETRY for `max.block.ms` ms, encountering the
> > > >>>>>>> UnknownTopicOrPartitionException.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Hope the main concerns are addressed so that we can go forward
> > > >> with
> > > >>>>>> voting.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Cheers,
> > > >>>>>>>
> > > >>>>>>> Alieh
> > > >>>>>>>
> > > >>>>>>> On Thu, May 9, 2024 at 11:25 PM Artem Livshits
> > > >>>>>>> <alivsh...@confluent.io.invalid> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi Mathias,
> > > >>>>>>>>
> > > >>>>>>>>> [AL1] While I see the point, I would think having a different
> > > >>>>> callback
> > > >>>>>>>> for every exception might not really be elegant?
> > > >>>>>>>>
> > > >>>>>>>> I'm not sure how to assess the level of elegance of the
> > > >> proposal,
> > > >>>> but
> > > >>>>> I
> > > >>>>>> can
> > > >>>>>>>> comment on the technical characteristics:
> > > >>>>>>>>
> > > >>>>>>>> 1. Having specific interfaces that codify the logic that is
> > > >>>> currently
> > > >>>>>>>> prescribed in the comments reduce the chance of making a
> > > >> mistake.
> > > >>>>>>>> Commments may get ignored, misuderstood or etc. but if the
> > > >>> contract
> > > >>>> is
> > > >>>>>>>> codified, the compilier will help to enforce the contract.
> > > >>>>>>>> 2. Given that the logic is trickier than it seems (the
> > > >>>>> record-too-large
> > > >>>>>> is
> > > >>>>>>>> an example that can easily confuse someone who's not intimately
> > > >>>>> familiar
> > > >>>>>>>> with the nuances of the batching logic), having a little more
> > > >>> hoops
> > > >>>> to
> > > >>>>>> jump
> > > >>>>>>>> would give a greater chance that whoever tries to add a new
> > > >> cases
> > > >>>>> pauses
> > > >>>>>>>> and thinks a bit more.
> > > >>>>>>>> 3. As Justine pointed out, having different method will be a
> > > >>> forcing
> > > >>>>>>>> function to go through a KIP rather than smuggle new cases
> > > >> through
> > > >>>>>>>> implementation.
> > > >>>>>>>> 4. Sort of a consequence of the previous 3 -- all those things
> > > >>>> reduce
> > > >>>>>> the
> > > >>>>>>>> chance of someone writing the code that works with 2 errors and
> > > >>> then
> > > >>>>>> when
> > > >>>>>>>> more errors are added in the future will suddenly incorrectly
> > > >>> ignore
> > > >>>>> new
> > > >>>>>>>> errors (the example I gave in the previous email).
> > > >>>>>>>>
> > > >>>>>>>>> [AL2 cont.] Similar to AL1, I see such a handler to some extend
> > > >>> as
> > > >>>>>>>> business logic. If a user puts a bad filter condition in their
> > > >> KS
> > > >>>> app,
> > > >>>>>> and
> > > >>>>>>>> drops messages
> > > >>>>>>>>
> > > >>>>>>>> I agree that there is always a chance to get a bug and lose
> > > >>>> messages,
> > > >>>>>> but
> > > >>>>>>>> there are generally separation of concerns that has different
> > > >> risk
> > > >>>>>> profile:
> > > >>>>>>>> the filtering logic may be more rigorously tested and rarely
> > > >>> changed
> > > >>>>>> (say
> > > >>>>>>>> an application developer does it), but setting the topics to
> > > >>> produce
> > > >>>>>> may be
> > > >>>>>>>> done via configuration (e.g. a user of the application does it)
> > > >>> and
> > > >>>>> it's
> > > >>>>>>>> generally an expectation that users would get an error when
> > > >>>>>> configuration
> > > >>>>>>>> is incorrect.
> > > >>>>>>>>
> > > >>>>>>>> What could be worse is that UnknownTopicOrPartitionException can
> > > >>> be
> > > >>>> an
> > > >>>>>>>> intermittent error, i.e. with a generally correct configuration,
> > > >>>> there
> > > >>>>>>>> could be metadata propagation problem on the cluster and then a
> > > >>>> random
> > > >>>>>> set
> > > >>>>>>>> of records could get lost.
> > > >>>>>>>>
> > > >>>>>>>>> [AL3] Maybe I misunderstand what you are saying, but to me,
> > > >>>> checking
> > > >>>>>> the
> > > >>>>>>>> size of the record upfront is exactly what the KIP proposes? No?
> > > >>>>>>>>
> > > >>>>>>>> It achieves the same result but solves it differently, my
> > > >>> proposal:
> > > >>>>>>>>
> > > >>>>>>>> 1. Application checks the validity of a record (maybe via a new
> > > >>>>>>>> validateRecord method) before producing it, and can just exclude
> > > >>> it
> > > >>>> or
> > > >>>>>>>> return an error to the user.
> > > >>>>>>>> 2. Application produces the record -- at this point there are no
> > > >>>>> records
> > > >>>>>>>> that could return record too large, they were either skipped at
> > > >>>> step 1
> > > >>>>>> or
> > > >>>>>>>> we didn't get here because step 1 failed.
> > > >>>>>>>>
> > > >>>>>>>> Vs. KIP's proposal
> > > >>>>>>>>
> > > >>>>>>>> 1. Application produces the record.
> > > >>>>>>>> 2. Application gets a callback.
> > > >>>>>>>> 3. Application returns the action on how to proceed.
> > > >>>>>>>>
> > > >>>>>>>> The advantage of the former is the clarity of semantics -- the
> > > >>>> record
> > > >>>>> is
> > > >>>>>>>> invalid (property of the record, not a function of server state
> > > >> or
> > > >>>>>> server
> > > >>>>>>>> configuration) and we can clearly know that it is the record
> > > >> that
> > > >>> is
> > > >>>>> bad
> > > >>>>>>>> and can never succeed.
> > > >>>>>>>>
> > > >>>>>>>> The KIP-proposed way actually has a very tricky point: it
> > > >> actually
> > > >>>>>> handles
> > > >>>>>>>> a subset of record-too-large exceptions.  The broker can return
> > > >>>>>>>> record-too-large and reject the whole batch (but we don't want
> > > >> to
> > > >>>>> ignore
> > > >>>>>>>> those because then we can skip random records that just happened
> > > >>> to
> > > >>>> be
> > > >>>>>> in
> > > >>>>>>>> the same batch), in some sense we use the same error for 2
> > > >>> different
> > > >>>>>>>> conditions and understanding that requires pretty deep
> > > >>> understanding
> > > >>>>> of
> > > >>>>>>>> Kafka internals.
> > > >>>>>>>>
> > > >>>>>>>> -Artem
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Wed, May 8, 2024 at 9:47 AM Justine Olshan
> > > >>>>>> <jols...@confluent.io.invalid
> > > >>>>>>>>>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> My concern with respect to it being fragile: the code that
> > > >>> ensures
> > > >>>>> the
> > > >>>>>>>>> error type is internal to the producer. Someone may see it and
> > > >>>> say, I
> > > >>>>>>>> want
> > > >>>>>>>>> to add such and such error. This looks like internal code, so I
> > > >>>> don't
> > > >>>>>>>> need
> > > >>>>>>>>> a KIP, and then they can change it to whatever they want
> > > >> thinking
> > > >>>> it
> > > >>>>> is
> > > >>>>>>>>> within the typical kafka improvement protocol.
> > > >>>>>>>>>
> > > >>>>>>>>> Relying on an internal change to enforce an external API is
> > > >>> fragile
> > > >>>>> in
> > > >>>>>> my
> > > >>>>>>>>> opinion. That's why I sort of agreed with Artem with enforcing
> > > >>> the
> > > >>>>>> error
> > > >>>>>>>> in
> > > >>>>>>>>> the method signature -- part of the public API.
> > > >>>>>>>>>
> > > >>>>>>>>> Chris's comments on requiring more information to handler again
> > > >>>> makes
> > > >>>>>> me
> > > >>>>>>>>> wonder if we are solving a problem of lack of information at
> > > >> the
> > > >>>>>>>>> application level with a more powerful solution than we need.
> > > >>> (Ie,
> > > >>>> if
> > > >>>>>> we
> > > >>>>>>>>> had more information, could the application close and restart
> > > >> the
> > > >>>>>>>>> transaction rather than having to drop records) But I am happy
> > > >> to
> > > >>>>>>>>> compromise with a handler that we can agree is sufficiently
> > > >>>>> controlled
> > > >>>>>>>> and
> > > >>>>>>>>> documented.
> > > >>>>>>>>>
> > > >>>>>>>>> Justine
> > > >>>>>>>>>
> > > >>>>>>>>> On Wed, May 8, 2024 at 7:20 AM Chris Egerton
> > > >>>> <chr...@aiven.io.invalid
> > > >>>>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Hi Alieh,
> > > >>>>>>>>>>
> > > >>>>>>>>>> Continuing prior discussions:
> > > >>>>>>>>>>
> > > >>>>>>>>>> 1) Regarding the "flexibility" discussion, my overarching
> > > >> point
> > > >>> is
> > > >>>>>>>> that I
> > > >>>>>>>>>> don't see the point in allowing for this kind of pluggable
> > > >> logic
> > > >>>>>>>> without
> > > >>>>>>>>>> also covering more scenarios. Take example 2 in the KIP: if
> > > >>> we're
> > > >>>>>> going
> > > >>>>>>>>> to
> > > >>>>>>>>>> implement retries only on "important" topics when a topic
> > > >>>> partition
> > > >>>>>>>> isn't
> > > >>>>>>>>>> found, why wouldn't we also want to be able to do this for
> > > >> other
> > > >>>>>>>> errors?
> > > >>>>>>>>>> Again, taking authorization errors as an example, why wouldn't
> > > >>> we
> > > >>>>> want
> > > >>>>>>>> to
> > > >>>>>>>>>> be able to fail when we can't write to "important" topics
> > > >>> because
> > > >>>>> the
> > > >>>>>>>>>> producer principal lacks sufficient ACLs, and drop the record
> > > >> if
> > > >>>> the
> > > >>>>>>>>> topic
> > > >>>>>>>>>> isn't "important"? In a security-conscious environment with
> > > >>>>>>>>>> runtime-dependent topic routing (which is a common feature of
> > > >>> many
> > > >>>>>>>> source
> > > >>>>>>>>>> connectors, such as the Debezium connectors), this seems
> > > >> fairly
> > > >>>>>> likely.
> > > >>>>>>>>>>
> > > >>>>>>>>>> 2) As far as changing the shape of the API goes, I like
> > > >> Artem's
> > > >>>> idea
> > > >>>>>> of
> > > >>>>>>>>>> splitting out the interface based on specific exceptions. This
> > > >>> may
> > > >>>>> be
> > > >>>>>> a
> > > >>>>>>>>>> little laborious to expand in the future, but if we really
> > > >> want
> > > >>> to
> > > >>>>>>>>>> limit the exceptions that we cover with the handler and move
> > > >>>> slowly
> > > >>>>>> and
> > > >>>>>>>>>> cautiously, then IMO it'd be reasonable to reflect that in the
> > > >>>>>>>>> interface. I
> > > >>>>>>>>>> also acknowledge that there's no way to completely prevent
> > > >>> people
> > > >>>>> from
> > > >>>>>>>>>> shooting themselves in the foot by implementing the API
> > > >>>> incorrectly,
> > > >>>>>>>> but
> > > >>>>>>>>> I
> > > >>>>>>>>>> think it's worth it to do what we can--including leveraging
> > > >> the
> > > >>>> Java
> > > >>>>>>>>>> language's type system--to help them, so IMO there's value to
> > > >>>>>>>> eliminating
> > > >>>>>>>>>> the implicit behavior of failing when a policy returns RETRY
> > > >>> for a
> > > >>>>>>>>>> non-retriable error. This can take a variety of shapes and I'm
> > > >>> not
> > > >>>>>>>> going
> > > >>>>>>>>> to
> > > >>>>>>>>>> insist on anything specific, but I do want to again raise my
> > > >>>>> concerns
> > > >>>>>>>>> with
> > > >>>>>>>>>> the current proposal and request that we find something a
> > > >> little
> > > >>>>>>>> better.
> > > >>>>>>>>>>
> > > >>>>>>>>>> 3) Concerning the default implementation--actually, I meant
> > > >>> what I
> > > >>>>>>>> wrote
> > > >>>>>>>>> :)
> > > >>>>>>>>>> I don't want a "second" default, I want an implementation of
> > > >>> this
> > > >>>>>>>>> interface
> > > >>>>>>>>>> to be used as the default if no others are specified. The
> > > >>> behavior
> > > >>>>> of
> > > >>>>>>>>> this
> > > >>>>>>>>>> default implementation would be identical to existing behavior
> > > >>> (so
> > > >>>>>>>> there
> > > >>>>>>>>>> would be no backwards compatibility concerns like the ones
> > > >>> raised
> > > >>>> by
> > > >>>>>>>>>> Matthias), but it would be possible to configure this default
> > > >>>>> handler
> > > >>>>>>>>> class
> > > >>>>>>>>>> to behave differently for a basic set of scenarios. This would
> > > >>>>> mirror
> > > >>>>>>>>> (pun
> > > >>>>>>>>>> intended) the approach we've taken with Mirror Maker 2 and its
> > > >>>>>>>>>> ReplicationPolicy interface [1]. There is a default
> > > >>> implementation
> > > >>>>>>>>>> available [2] that recognizes a handful of basic configuration
> > > >>>>>>>> properties
> > > >>>>>>>>>> [3] for simple tweaks, but if users want, they can also
> > > >>> implement
> > > >>>>>> their
> > > >>>>>>>>> own
> > > >>>>>>>>>> replication policy for more fine-grained logic if those
> > > >>> properties
> > > >>>>>>>> aren't
> > > >>>>>>>>>> flexible enough.
> > > >>>>>>>>>>
> > > >>>>>>>>>> More concretely, I'm imagining something like this for the
> > > >>>> producer
> > > >>>>>>>>>> exception handler:
> > > >>>>>>>>>>
> > > >>>>>>>>>> - Default implementation class
> > > >>>>>>>>>> of
> > > >>>> org.apache.kafka.clients.producer.DefaultProducerExceptionHandler
> > > >>>>>>>>>> - This class would recognize two properties:
> > > >>>>>>>>>> - drop.invalid.large.records: Boolean property, defaults to
> > > >>>> false.
> > > >>>>> If
> > > >>>>>>>>>> "false", then causes the handler to return FAIL whenever
> > > >>>>>>>>>> a RecordTooLargeException is encountered; if "true", then
> > > >> causes
> > > >>>>>>>>>> SWALLOW/SKIP/DROP to be returned instead
> > > >>>>>>>>>> - unknown.topic.partition.retry.timeout.ms: Integer
> > > >> property,
> > > >>>>>>>> defaults
> > > >>>>>>>>>> to
> > > >>>>>>>>>> INT_MAX. Whenever an UnknownTopicOrPartitionException is
> > > >>>>> encountered,
> > > >>>>>>>>>> causes the handler to return FAIL if that record has been
> > > >>> pending
> > > >>>>> for
> > > >>>>>>>>> more
> > > >>>>>>>>>> than the retry timeout; otherwise, causes RETRY to be returned
> > > >>>>>>>>>>
> > > >>>>>>>>>> I think this is worth addressing now instead of later because
> > > >> it
> > > >>>>>> forces
> > > >>>>>>>>> us
> > > >>>>>>>>>> to evaluate the usefulness of this interface and it addresses
> > > >> a
> > > >>>>>>>>>> long-standing issue not just with Kafka Connect, but with the
> > > >>> Java
> > > >>>>>>>>> producer
> > > >>>>>>>>>> in general. For reference, here are a few tickets I collected
> > > >>>> after
> > > >>>>>>>>> briefly
> > > >>>>>>>>>> skimming our Jira showing that this is a real pain point for
> > > >>>> users:
> > > >>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-10340,
> > > >>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-12990,
> > > >>>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-13634. Although
> > > >>> this
> > > >>>> is
> > > >>>>>>>>>> frequently reported with Kafka Connect, it applies to anyone
> > > >> who
> > > >>>>>>>>> configures
> > > >>>>>>>>>> a producer to use a high retry timeout. I am aware of the
> > > >>>>>> max.block.ms
> > > >>>>>>>>>> property, but it's painful and IMO poor behavior to require
> > > >>> users
> > > >>>> to
> > > >>>>>>>>> reduce
> > > >>>>>>>>>> the value of this property just to handle the single scenario
> > > >>> when
> > > >>>>>>>> trying
> > > >>>>>>>>>> to write to topics that don't exist, since it would also limit
> > > >>> the
> > > >>>>>>>> retry
> > > >>>>>>>>>> timeout for other operations that are legitimately retriable.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Raising new points:
> > > >>>>>>>>>>
> > > >>>>>>>>>> 5) I don't see the interplay between this handler and existing
> > > >>>>>>>>>> retry-related properties mentioned anywhere in the KIP. I'm
> > > >>>> assuming
> > > >>>>>>>> that
> > > >>>>>>>>>> properties like "retries", "max.block.ms", and "
> > > >>>> delivery.timeout.ms
> > > >>>>> "
> > > >>>>>>>>> would
> > > >>>>>>>>>> take precedence over the handler and once they are exhausted,
> > > >>> the
> > > >>>>>>>>>> record/batch will fail no matter what? If so, it's probably
> > > >>> worth
> > > >>>>>>>> briefly
> > > >>>>>>>>>> mentioning this (no more than a sentence or two) in the KIP,
> > > >> and
> > > >>>> if
> > > >>>>>>>> not,
> > > >>>>>>>>>> I'm curious what you have in mind.
> > > >>>>>>>>>>
> > > >>>>>>>>>> 6) I also wonder if the API provides enough information in its
> > > >>>>> current
> > > >>>>>>>>>> form. Would it be possible to provide handlers with some way
> > > >> of
> > > >>>>>>>> tracking
> > > >>>>>>>>>> how long a record has been pending for (i.e., how long it's
> > > >> been
> > > >>>>> since
> > > >>>>>>>>> the
> > > >>>>>>>>>> record was provided as an argument to Producer::send)? One way
> > > >>> to
> > > >>>> do
> > > >>>>>>>> this
> > > >>>>>>>>>> could be to add a method like `onNewRecord(ProducerRecord)`
> > > >> and
> > > >>>>>>>>>> allow/require the handler to do its own bookkeeping, probably
> > > >>>> with a
> > > >>>>>>>>>> matching `onRecordSuccess(ProducerRecord)` method so that the
> > > >>>>> handler
> > > >>>>>>>>>> doesn't eat up an ever-increasing amount of memory trying to
> > > >>> track
> > > >>>>>>>>> records.
> > > >>>>>>>>>> An alternative could be to include information about the
> > > >> initial
> > > >>>>> time
> > > >>>>>>>> the
> > > >>>>>>>>>> record was received by the producer and the number of retries
> > > >>> that
> > > >>>>>> have
> > > >>>>>>>>>> been performed for it as parameters in the handle method(s),
> > > >> but
> > > >>>> I'm
> > > >>>>>>>> not
> > > >>>>>>>>>> sure how easy this would be to implement and it might clutter
> > > >>>> things
> > > >>>>>>>> up a
> > > >>>>>>>>>> bit too much.
> > > >>>>>>>>>>
> > > >>>>>>>>>> 7) A small request--can we add Closeable (or, if you prefer,
> > > >>>>>>>>> AutoCloseable)
> > > >>>>>>>>>> as a superinterface for the handler interface?
> > > >>>>>>>>>>
> > > >>>>>>>>>> [1] -
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > https://kafka.apache.org/37/javadoc/org/apache/kafka/connect/mirror/ReplicationPolicy.html
> > > >>>>>>>>>> [2] -
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > https://kafka.apache.org/37/javadoc/org/apache/kafka/connect/mirror/DefaultReplicationPolicy.html
> > > >>>>>>>>>> [3] -
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > https://kafka.apache.org/37/javadoc/org/apache/kafka/connect/mirror/DefaultReplicationPolicy.html#SEPARATOR_CONFIG
> > > >>>>>>>>>> ,
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > https://kafka.apache.org/37/javadoc/org/apache/kafka/connect/mirror/DefaultReplicationPolicy.html#INTERNAL_TOPIC_SEPARATOR_ENABLED_CONFIG
> > > >>>>>>>>>>
> > > >>>>>>>>>> Cheers,
> > > >>>>>>>>>>
> > > >>>>>>>>>> Chris
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Wed, May 8, 2024 at 12:37 AM Matthias J. Sax <
> > > >>> mj...@apache.org
> > > >>>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Very interesting discussion.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Seems a central point is the question about "how generic" we
> > > >>>>> approach
> > > >>>>>>>>>>> this, and some people think we need to be conservative and
> > > >>> others
> > > >>>>>>>> think
> > > >>>>>>>>>>> we should try to be as generic as possible.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Personally, I think following a limited scope for this KIP
> > > >> (by
> > > >>>>>>>>>>> explicitly saying we only cover RecordTooLarge and
> > > >>>>>>>>>>> UnknownTopicOrPartition) might be better. We have concrete
> > > >>>> tickets
> > > >>>>>>>> that
> > > >>>>>>>>>>> we address, while for other exception (like authorization) we
> > > >>>> don't
> > > >>>>>>>>> know
> > > >>>>>>>>>>> if people want to handle it to begin with. Boiling the ocean
> > > >>>> might
> > > >>>>>>>> not
> > > >>>>>>>>>>> get us too far, and being somewhat pragmatic might help to
> > > >> move
> > > >>>>> this
> > > >>>>>>>>> KIP
> > > >>>>>>>>>>> forward. -- I also agree with Justin and Artem, that we want
> > > >> to
> > > >>>> be
> > > >>>>>>>>>>> careful anyway to not allow users to break stuff too easily.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> As the same time, I agree that we should setup this change
> > > >> in a
> > > >>>>>>>> forward
> > > >>>>>>>>>>> looking way, and thus having a single generic handler allows
> > > >> us
> > > >>>> to
> > > >>>>>>>>> later
> > > >>>>>>>>>>> extend the handler more easily. This should also simplify
> > > >>> follow
> > > >>>> up
> > > >>>>>>>> KIP
> > > >>>>>>>>>>> that might add new error cases (I actually mentioned one more
> > > >>> to
> > > >>>>>>>> Alieh
> > > >>>>>>>>>>> already, but we both agreed that it might be best to exclude
> > > >> it
> > > >>>>> from
> > > >>>>>>>>> the
> > > >>>>>>>>>>> KIP right now, to make the 3.8 deadline. Doing a follow up
> > > >> KIP
> > > >>> is
> > > >>>>> not
> > > >>>>>>>>>>> the end of the world.)
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> @Chris:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> (2) This sounds fair to me, but not sure how "bad" it
> > > >> actually
> > > >>>>> would
> > > >>>>>>>>> be?
> > > >>>>>>>>>>> If the contract is clearly defined, it seems to be fine what
> > > >>> the
> > > >>>>> KIP
> > > >>>>>>>>>>> proposes, and given that such a handler is an expert API, and
> > > >>> we
> > > >>>>> can
> > > >>>>>>>>>>> provide "best practices" (cf my other comment below in
> > > >> [AL1]),
> > > >>>>> being
> > > >>>>>>>> a
> > > >>>>>>>>>>> little bit pragmatic sound fine to me.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Not sure if I understand Justin's argument on this question?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> (3) About having a default impl or not. I am fine with adding
> > > >>>> one,
> > > >>>>>>>> even
> > > >>>>>>>>>>> if I am not sure why Connect could not just add its own one
> > > >> and
> > > >>>>> plug
> > > >>>>>>>> it
> > > >>>>>>>>>>> in (and we would add corresponding configs for Connect, but
> > > >> not
> > > >>>> for
> > > >>>>>>>> the
> > > >>>>>>>>>>> producer itself)? For this case, we could also do this as a
> > > >>>> follow
> > > >>>>> up
> > > >>>>>>>>>>> KIP, but happy to include it in this KIP to provide value to
> > > >>>>> Connect
> > > >>>>>>>>>>> right away (even if the value might not come right away if we
> > > >>>> miss
> > > >>>>>>>> the
> > > >>>>>>>>>>> 3.8 deadline due to expanded KIP scope...) --  For KS, we
> > > >> would
> > > >>>> for
> > > >>>>>>>>> sure
> > > >>>>>>>>>>> plugin our own impl, and lock down the config such that users
> > > >>>>> cannot
> > > >>>>>>>>> set
> > > >>>>>>>>>>> their own handler on the internal producer to begin with.
> > > >> Might
> > > >>>> be
> > > >>>>>>>> good
> > > >>>>>>>>>>> to elaborate why the producer should have a default? We might
> > > >>>>>>>> actually
> > > >>>>>>>>>>> want to add this to the KIP right away?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> The key for a default impl would be, to not change the
> > > >> current
> > > >>>>>>>>> behavior,
> > > >>>>>>>>>>> and having no default seems to achieve this. For the two
> > > >> cases
> > > >>>> you
> > > >>>>>>>>>>> mentioned, it's unclear to me what default value on "upper
> > > >>> bound
> > > >>>> on
> > > >>>>>>>>>>> retires" for UnkownTopicOrPartitionException we should set?
> > > >>> Seems
> > > >>>>> it
> > > >>>>>>>>>>> would need to be the same as `delivery.timeout.ms`? However,
> > > >>> if
> > > >>>>>>>> users
> > > >>>>>>>>>>> have `delivery.timeout.ms` actually overwritten we would
> > > >> need
> > > >>> to
> > > >>>>> set
> > > >>>>>>>>>>> this config somewhat "dynamic"? Is this feasible? If we
> > > >>>> hard-code 2
> > > >>>>>>>>>>> minutes, it might not be backward compatible. I have the
> > > >>>> impression
> > > >>>>>>>> we
> > > >>>>>>>>>>> might introduce some undesired coupling? -- For the "record
> > > >> too
> > > >>>>>>>> large"
> > > >>>>>>>>>>> case, the config seems to be boolean and setting it to
> > > >> `false`
> > > >>> by
> > > >>>>>>>>>>> default seems to provide backward compatibility.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> @Artem:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> [AL1] While I see the point, I would think having a different
> > > >>>>>>>> callback
> > > >>>>>>>>>>> for every exception might not really be elegant? In the end,
> > > >>> the
> > > >>>>>>>>> handler
> > > >>>>>>>>>>> is an very advanced feature anyway, and if it's implemented
> > > >> in
> > > >>> a
> > > >>>>> bad
> > > >>>>>>>>>>> way, well, it's a user error -- we cannot protect users from
> > > >>>>>>>>> everything.
> > > >>>>>>>>>>> To me, a handler like this, is to some extend "business
> > > >> logic"
> > > >>>> and
> > > >>>>>>>> if a
> > > >>>>>>>>>>> user gets business logic wrong, it's hard to protect them. --
> > > >>> We
> > > >>>>>>>> would
> > > >>>>>>>>>>> of course provide best practice guidance in the JaveDocs, and
> > > >>>>> explain
> > > >>>>>>>>>>> that a handler should have explicit `if` statements for stuff
> > > >>> it
> > > >>>>> want
> > > >>>>>>>>> to
> > > >>>>>>>>>>> handle, and only a single default which return FAIL.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> [AL2] Yes, but for KS we would retry at the application
> > > >> layer.
> > > >>>> Ie,
> > > >>>>>>>> the
> > > >>>>>>>>>>> TX is not completed, we clean up and setup out task from
> > > >>> scratch,
> > > >>>>> to
> > > >>>>>>>>>>> ensure the pending transaction is completed before we resume.
> > > >>> If
> > > >>>>> the
> > > >>>>>>>> TX
> > > >>>>>>>>>>> was indeed aborted, we would retry from older offset and thus
> > > >>>> just
> > > >>>>>>>> hit
> > > >>>>>>>>>>> the same error again and the loop begins again.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> [AL2 cont.] Similar to AL1, I see such a handler to some
> > > >> extend
> > > >>>> as
> > > >>>>>>>>>>> business logic. If a user puts a bad filter condition in
> > > >> their
> > > >>> KS
> > > >>>>>>>> app,
> > > >>>>>>>>>>> and drops messages, it nothing we can do about it, and this
> > > >>>> handler
> > > >>>>>>>>>>> IMHO, has a similar purpose. This is also the line of
> > > >> thinking
> > > >>> I
> > > >>>>>>>> apply
> > > >>>>>>>>>>> to EOS, to address Justin's concern about "should we allow to
> > > >>>> drop
> > > >>>>>>>> for
> > > >>>>>>>>>>> EOS", and my answer is "yes", because it's more business
> > > >> logic
> > > >>>> than
> > > >>>>>>>>>>> actual error handling IMHO. And by default, we fail... So
> > > >> users
> > > >>>>>>>> opt-in
> > > >>>>>>>>>>> to add business logic to drop records. It's an application
> > > >>> level
> > > >>>>>>>>>>> decision how to write the code.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> [AL3] Maybe I misunderstand what you are saying, but to me,
> > > >>>>> checking
> > > >>>>>>>>> the
> > > >>>>>>>>>>> size of the record upfront is exactly what the KIP proposes?
> > > >>> No?
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> @Justin:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> I saw the sample
> > > >>>>>>>>>>>> code -- is it just an if statement checking for the error
> > > >>> before
> > > >>>>>>>> the
> > > >>>>>>>>>>>> handler is invoked? That seems a bit fragile.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> What do you mean by fragile? Not sure if I see your point.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> -Matthias
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On 5/7/24 5:33 PM, Artem Livshits wrote:
> > > >>>>>>>>>>>> Hi Alieh,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Thanks for the KIP.  The motivation talks about very
> > > >> specific
> > > >>>>>>>> cases,
> > > >>>>>>>>>> but
> > > >>>>>>>>>>>> the interface is generic.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> [AL1]
> > > >>>>>>>>>>>> If the interface evolves in the future I think we could have
> > > >>> the
> > > >>>>>>>>>>> following
> > > >>>>>>>>>>>> confusion:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> 1. A user implemented SWALLOW action for both
> > > >>>>>>>> RecordTooLargeException
> > > >>>>>>>>>> and
> > > >>>>>>>>>>>> UnknownTopicOrPartitionException.  For simpicity they just
> > > >>>> return
> > > >>>>>>>>>> SWALLOW
> > > >>>>>>>>>>>> from the function, because it elegantly handles all known
> > > >>> cases.
> > > >>>>>>>>>>>> 2. The interface has evolved to support a new exception.
> > > >>>>>>>>>>>> 3. The user has upgraded their Kafka client.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Now a new kind of error gets dropped on the floor without
> > > >>> user's
> > > >>>>>>>>>>> intention
> > > >>>>>>>>>>>> and it would be super hard to detect and debug.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> To avoid the confusion, I think we should use handlers for
> > > >>>>> specific
> > > >>>>>>>>>>>> exceptions.  Then if a new exception is added it won't get
> > > >>>>> silently
> > > >>>>>>>>>>> swalled
> > > >>>>>>>>>>>> because the user would need to add new functionality to
> > > >> handle
> > > >>>> it.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I also have some higher level comments:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> [AL2]
> > > >>>>>>>>>>>>> it throws a TimeoutException, and the user can only blindly
> > > >>>>> retry,
> > > >>>>>>>>>> which
> > > >>>>>>>>>>>> may result in an infinite retry loop
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> If the TimeoutException happens during transactional
> > > >>> processing
> > > >>>>>>>>>> (exactly
> > > >>>>>>>>>>>> once is the desired sematnics), then the client should not
> > > >>> retry
> > > >>>>>>>> when
> > > >>>>>>>>>> it
> > > >>>>>>>>>>>> gets TimeoutException because without knowing the reason for
> > > >>>>>>>>>>>> TimeoutExceptions, the client cannot know whether the
> > > >> message
> > > >>>> got
> > > >>>>>>>>>>> actually
> > > >>>>>>>>>>>> produced or not and retrying the message may result in
> > > >>>>> duplicatees.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> The thrown TimeoutException "cuts" the connection to the
> > > >>>>>>>> underlying
> > > >>>>>>>>>> root
> > > >>>>>>>>>>>> cause of missing metadata
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Maybe we should fix the error handling and return the proper
> > > >>>>>>>>> underlying
> > > >>>>>>>>>>>> message?  Then the application can properly handle the
> > > >> message
> > > >>>>>>>> based
> > > >>>>>>>>> on
> > > >>>>>>>>>>>> preferences.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> From the product perspective, it's not clear how safe it is
> > > >> to
> > > >>>>>>>>> blindly
> > > >>>>>>>>>>>> ignore UnknownTopicOrPartitionException.  This could lead to
> > > >>>>>>>>> situations
> > > >>>>>>>>>>>> when a simple typo could lead to massive data loss (part of
> > > >>> the
> > > >>>>>>>> data
> > > >>>>>>>>>>> would
> > > >>>>>>>>>>>> effectively be produced to a "black hole" and the user may
> > > >> not
> > > >>>>>>>> notice
> > > >>>>>>>>>> it
> > > >>>>>>>>>>>> for a while).
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> In which situations would you recommend the user to "black
> > > >>> hole"
> > > >>>>>>>>>> messages
> > > >>>>>>>>>>>> in case of misconfiguration?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> [AL3]
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> If the custom handler decides on SWALLOW for
> > > >>>>>>>>> RecordTooLargeException,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Is it my understanding that this KIP proposes that
> > > >>> functionality
> > > >>>>>>>> that
> > > >>>>>>>>>>> would
> > > >>>>>>>>>>>> only be able to SWALLOW RecordTooLargeException that happen
> > > >>>>> because
> > > >>>>>>>>> the
> > > >>>>>>>>>>>> producer cannot produce the record (if the broker rejects
> > > >> the
> > > >>>>>>>> batch,
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>> error won't get to the handler, because we cannot know which
> > > >>>> other
> > > >>>>>>>>>>> records
> > > >>>>>>>>>>>> get ignored).  In this case, why not just check the locally
> > > >>>>>>>>> configured
> > > >>>>>>>>>>> max
> > > >>>>>>>>>>>> record size upfront and not produce the recrord in the first
> > > >>>>> place?
> > > >>>>>>>>>>> Maybe
> > > >>>>>>>>>>>> we can expose a validation function from the producer that
> > > >>> could
> > > >>>>>>>>>> validate
> > > >>>>>>>>>>>> the records locally, so we don't need to produce the record
> > > >> in
> > > >>>>>>>> order
> > > >>>>>>>>> to
> > > >>>>>>>>>>>> know that it's invalid.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> -Artem
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Tue, May 7, 2024 at 2:07 PM Justine Olshan
> > > >>>>>>>>>>> <jols...@confluent.io.invalid>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Alieh and Chris,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Thanks for clarifying 1) but I saw the motivation. I guess
> > > >> I
> > > >>>> just
> > > >>>>>>>>>> didn't
> > > >>>>>>>>>>>>> understand how that would be ensured on the producer side.
> > > >> I
> > > >>>> saw
> > > >>>>>>>> the
> > > >>>>>>>>>>> sample
> > > >>>>>>>>>>>>> code -- is it just an if statement checking for the error
> > > >>>> before
> > > >>>>>>>> the
> > > >>>>>>>>>>>>> handler is invoked? That seems a bit fragile.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Can you clarify what you mean by `since the code does not
> > > >>> reach
> > > >>>>>>>> the
> > > >>>>>>>>> KS
> > > >>>>>>>>>>>>> interface and breaks somewhere in producer.` If we surfaced
> > > >>>> this
> > > >>>>>>>>> error
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>> the application in a better way would that also be a
> > > >> solution
> > > >>>> to
> > > >>>>>>>> the
> > > >>>>>>>>>>> issue?
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Justine
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Tue, May 7, 2024 at 1:55 PM Alieh Saeedi
> > > >>>>>>>>>>> <asae...@confluent.io.invalid>
> > > >>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Hi,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Thank you, Chris and Justine, for the feedback.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> @Chris
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> 1) Flexibility: it has two meanings. The first meaning is
> > > >>> the
> > > >>>>> one
> > > >>>>>>>>> you
> > > >>>>>>>>>>>>>> mentioned. We are going to cover more exceptions in the
> > > >>>> future,
> > > >>>>>>>> but
> > > >>>>>>>>>> as
> > > >>>>>>>>>>>>>> Justine mentioned, we must be very conservative about
> > > >> adding
> > > >>>>> more
> > > >>>>>>>>>>>>>> exceptions. Additionally, flexibility mainly means that
> > > >> the
> > > >>>> user
> > > >>>>>>>> is
> > > >>>>>>>>>>> able
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>> develop their own code. As mentioned in the motivation
> > > >>> section
> > > >>>>>>>> and
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>>> examples, sometimes the user decides on dropping a record
> > > >>>> based
> > > >>>>>>>> on
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>>> topic, for example.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> 2) Defining two separate methods for retriable and
> > > >>>> non-retriable
> > > >>>>>>>>>>>>>> exceptions: although the idea is brilliant, the user may
> > > >>> still
> > > >>>>>>>>> make a
> > > >>>>>>>>>>>>>> mistake by implementing the wrong method and see a
> > > >>>> non-expecting
> > > >>>>>>>>>>>>> behaviour.
> > > >>>>>>>>>>>>>> For example, he may implement handleRetriable() for
> > > >>>>>>>>>>>>> RecordTooLargeException
> > > >>>>>>>>>>>>>> and define SWALLOW for the exception, but in practice, he
> > > >>> sees
> > > >>>>> no
> > > >>>>>>>>>>> change
> > > >>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>> default behaviour since he implemented the wrong method. I
> > > >>>> think
> > > >>>>>>>> we
> > > >>>>>>>>>> can
> > > >>>>>>>>>>>>>> never reduce the user’s mistakes to 0.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> 3) Default implementation for Handler: the default
> > > >> behaviour
> > > >>>> is
> > > >>>>>>>>>> already
> > > >>>>>>>>>>>>>> preserved with NO need of implementing any handler or
> > > >>> setting
> > > >>>>> the
> > > >>>>>>>>>>>>>> corresponding config parameter `custom.exception.handler`.
> > > >>>> What
> > > >>>>>>>> you
> > > >>>>>>>>>>> mean
> > > >>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>> actually having a second default, which requires having
> > > >> both
> > > >>>>>>>>>> interface
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>> config parameters. About UnknownTopicOrPartitionException:
> > > >>> the
> > > >>>>>>>>>> producer
> > > >>>>>>>>>>>>>> already offers the config parameter `max.block.ms` which
> > > >>>>>>>>> determines
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> duration of retrying. The main purpose of the user who
> > > >> needs
> > > >>>> the
> > > >>>>>>>>>>> handler
> > > >>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>> to get the root cause of TimeoutException and handle it in
> > > >>> the
> > > >>>>>>>> way
> > > >>>>>>>>> he
> > > >>>>>>>>>>>>>> intends. The KIP explains the necessity of it for KS
> > > >> users.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> 4) Naming issue: By SWALLOW, we meant actually swallow the
> > > >>>>> error,
> > > >>>>>>>>>> while
> > > >>>>>>>>>>>>>> SKIP means skip the record, I think. If it makes sense for
> > > >>>> more
> > > >>>>>>>>> ppl,
> > > >>>>>>>>>> I
> > > >>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>> change it to SKIP
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> @Justine
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> 1) was addressed by Chris.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> 2 and 3) The problem is exactly what you mentioned.
> > > >>> Currently,
> > > >>>>>>>>> there
> > > >>>>>>>>>> is
> > > >>>>>>>>>>>>> no
> > > >>>>>>>>>>>>>> way to handle these issues application-side. Even KS users
> > > >>> who
> > > >>>>>>>>>>> implement
> > > >>>>>>>>>>>>> KS
> > > >>>>>>>>>>>>>> ProductionExceptionHandler are not able to handle the
> > > >>>> exceptions
> > > >>>>>>>> as
> > > >>>>>>>>>>> they
> > > >>>>>>>>>>>>>> intend since the code does not reach the KS interface and
> > > >>>> breaks
> > > >>>>>>>>>>>>> somewhere
> > > >>>>>>>>>>>>>> in producer.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>> Alieh
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Tue, May 7, 2024 at 8:43 PM Chris Egerton <
> > > >>>>>>>>>> fearthecel...@gmail.com>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Hi Justine,
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> The method signatures for the interface are indeed
> > > >>>> open-ended,
> > > >>>>>>>> but
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>>> KIP
> > > >>>>>>>>>>>>>>> states that its uses will be limited. See the motivation
> > > >>>>>>>> section:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> We believe that the user should be able to develop
> > > >> custom
> > > >>>>>>>>> exception
> > > >>>>>>>>>>>>>>> handlers for managing producer exceptions. On the other
> > > >>> hand,
> > > >>>>>>>> this
> > > >>>>>>>>>>> will
> > > >>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>> an expert-level API, and using that may result in strange
> > > >>>>>>>>> behaviour
> > > >>>>>>>>>> in
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> system, making it hard to find the root cause. Therefore,
> > > >>> the
> > > >>>>>>>>> custom
> > > >>>>>>>>>>>>>>> handler is currently limited to handling
> > > >>>>> RecordTooLargeException
> > > >>>>>>>>> and
> > > >>>>>>>>>>>>>>> UnknownTopicOrPartitionException.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Chris
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> On Tue, May 7, 2024, 14:37 Justine Olshan
> > > >>>>>>>>>>> <jols...@confluent.io.invalid
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Hi Alieh,
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I was out for KSB and then was also sick. :(
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> To your point 1) Chris, I don't think it is limited to
> > > >> two
> > > >>>>>>>>> specific
> > > >>>>>>>>>>>>>>>> scenarios, since the interface accepts a generic
> > > >>> Exception e
> > > >>>>>>>> and
> > > >>>>>>>>>> can
> > > >>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>> implemented to check if that e is an instanceof any
> > > >>>> exception.
> > > >>>>>>>> I
> > > >>>>>>>>>>>>> didn't
> > > >>>>>>>>>>>>>>> see
> > > >>>>>>>>>>>>>>>> anywhere that specific errors are enforced. I'm a bit
> > > >>>>> concerned
> > > >>>>>>>>>> about
> > > >>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>> actually. I'm concerned about the opened-endedness and
> > > >> the
> > > >>>>>>>>> contract
> > > >>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>>> with transactions. We are allowing the client to make
> > > >>>>> decisions
> > > >>>>>>>>>> that
> > > >>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>> somewhat invisible to the server. As an aside, can we
> > > >>> build
> > > >>>> in
> > > >>>>>>>>> log
> > > >>>>>>>>>>>>>>> messages
> > > >>>>>>>>>>>>>>>> when the handler decides to skip etc a message. I'm
> > > >> really
> > > >>>>>>>>>> concerned
> > > >>>>>>>>>>>>>>> about
> > > >>>>>>>>>>>>>>>> messages being silently dropped.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I do think Chris's point 2) about retriable vs non
> > > >>> retriable
> > > >>>>>>>>> errors
> > > >>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>> fair. I'm a bit concerned about skipping a unknown topic
> > > >>> or
> > > >>>>>>>>>> partition
> > > >>>>>>>>>>>>>>>> exception too early, as there are cases where it can be
> > > >>>>>>>>> transient.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I'm still a little bit wary of allowing dropping records
> > > >>> as
> > > >>>>>>>> part
> > > >>>>>>>>> of
> > > >>>>>>>>>>>>> EOS
> > > >>>>>>>>>>>>>>>> generally as in many cases, these errors signify an
> > > >> issue
> > > >>>> with
> > > >>>>>>>>> the
> > > >>>>>>>>>>>>>>> original
> > > >>>>>>>>>>>>>>>> data. I understand that streams and connect/mirror maker
> > > >>> may
> > > >>>>>>>> have
> > > >>>>>>>>>>>>>> reasons
> > > >>>>>>>>>>>>>>>> they want to progress past these messages, but wondering
> > > >>> if
> > > >>>>>>>> there
> > > >>>>>>>>>> is
> > > >>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>> way
> > > >>>>>>>>>>>>>>>> that can be done application-side. I'm willing to accept
> > > >>>> this
> > > >>>>>>>>> sort
> > > >>>>>>>>>> of
> > > >>>>>>>>>>>>>>>> proposal if we can make it clear that this sort of thing
> > > >>> is
> > > >>>>>>>>>> happening
> > > >>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>> we limit the blast radius for what we can do.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Justine
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On Tue, May 7, 2024 at 9:55 AM Chris Egerton
> > > >>>>>>>>>> <chr...@aiven.io.invalid
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Hi Alieh,
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Sorry for the delay, I've been out sick. I still have
> > > >>> some
> > > >>>>>>>>>> thoughts
> > > >>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>> I'd like to see addressed before voting.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 1) If flexibility is the motivation for a pluggable
> > > >>>>> interface,
> > > >>>>>>>>> why
> > > >>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>> only limiting the uses for this interface to two very
> > > >>>>> specific
> > > >>>>>>>>>>>>>>> scenarios?
> > > >>>>>>>>>>>>>>>>> Why not also allow, e.g., authorization errors to be
> > > >>>> handled
> > > >>>>>>>> as
> > > >>>>>>>>>>>>> well
> > > >>>>>>>>>>>>>>>>> (allowing users to drop records destined for some
> > > >>>> off-limits
> > > >>>>>>>>>>>>> topics,
> > > >>>>>>>>>>>>>> or
> > > >>>>>>>>>>>>>>>>> retry for a limited duration in case there's a delay in
> > > >>> the
> > > >>>>>>>>>>>>>> propagation
> > > >>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>> ACL updates)? It'd be nice to see some analysis of
> > > >> other
> > > >>>>>>>> errors
> > > >>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>> could
> > > >>>>>>>>>>>>>>>>> be handled with this new API, both to avoid the
> > > >> follow-up
> > > >>>>> work
> > > >>>>>>>>> of
> > > >>>>>>>>>>>>>>> another
> > > >>>>>>>>>>>>>>>>> KIP to address them in the future, and to make sure
> > > >> that
> > > >>>>> we're
> > > >>>>>>>>> not
> > > >>>>>>>>>>>>>>>> painting
> > > >>>>>>>>>>>>>>>>> ourselves into a corner with the current API in a way
> > > >>> that
> > > >>>>>>>> would
> > > >>>>>>>>>>>>> make
> > > >>>>>>>>>>>>>>>>> future modifications difficult.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 2) Something feels a bit off with how retriable vs.
> > > >>>>>>>>> non-retriable
> > > >>>>>>>>>>>>>>> errors
> > > >>>>>>>>>>>>>>>>> are handled with the interface. Why not introduce two
> > > >>>>> separate
> > > >>>>>>>>>>>>>> methods
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> handle each case separately? That way there's no
> > > >>> ambiguity
> > > >>>> or
> > > >>>>>>>>>>>>>> implicit
> > > >>>>>>>>>>>>>>>>> behavior when, e.g., attempting to retry on a
> > > >>>>>>>>>>>>>> RecordTooLargeException.
> > > >>>>>>>>>>>>>>>> This
> > > >>>>>>>>>>>>>>>>> could be something like `NonRetriableResponse
> > > >>>>>>>>>>>>> handle(ProducerRecord,
> > > >>>>>>>>>>>>>>>>> Exception)` and `RetriableResponse
> > > >>>>>>>>> handleRetriable(ProducerRecord,
> > > >>>>>>>>>>>>>>>>> Exception)`, though the exact names and shape can
> > > >>> obviously
> > > >>>>> be
> > > >>>>>>>>>>>>> toyed
> > > >>>>>>>>>>>>>>>> with a
> > > >>>>>>>>>>>>>>>>> bit.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 3) Although the flexibility of a pluggable interface
> > > >> may
> > > >>>>>>>> benefit
> > > >>>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>>>> users' custom producer applications and Kafka Streams
> > > >>>>>>>>>> applications,
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>> comes at significant deployment cost for other
> > > >>> low-/no-code
> > > >>>>>>>>>>>>>>> environments,
> > > >>>>>>>>>>>>>>>>> including but not limited to Kafka Connect and
> > > >>> MirrorMaker
> > > >>>> 2.
> > > >>>>>>>>> Can
> > > >>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>> add
> > > >>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>> default implementation of the exception handler that
> > > >>> allows
> > > >>>>>>>> for
> > > >>>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>>> simple
> > > >>>>>>>>>>>>>>>>> behavior to be tweaked via configuration property? Two
> > > >>>> things
> > > >>>>>>>>> that
> > > >>>>>>>>>>>>>>> would
> > > >>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>> nice to have would be A) an upper bound on the retry
> > > >> time
> > > >>>> for
> > > >>>>>>>>>>>>>>>>> unknown-topic-partition exceptions and B) an option to
> > > >>> drop
> > > >>>>>>>>>> records
> > > >>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>> are large enough to trigger a record-too-large
> > > >> exception.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 4) I'd still prefer to see "SKIP" or "DROP" instead of
> > > >>> the
> > > >>>>>>>>>> proposed
> > > >>>>>>>>>>>>>>>>> "SWALLOW" option, which IMO is opaque and non-obvious,
> > > >>>>>>>>> especially
> > > >>>>>>>>>>>>>> when
> > > >>>>>>>>>>>>>>>>> trying to guess the behavior for retriable errors.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Chris
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Fri, May 3, 2024 at 11:23 AM Alieh Saeedi
> > > >>>>>>>>>>>>>>>> <asae...@confluent.io.invalid
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Hi all,
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> A summary of the KIP and the discussions:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> The KIP introduces a handler interface for Producer in
> > > >>>> order
> > > >>>>>>>> to
> > > >>>>>>>>>>>>>>> handle
> > > >>>>>>>>>>>>>>>>> two
> > > >>>>>>>>>>>>>>>>>> exceptions: RecordTooLargeException and
> > > >>>>>>>>>>>>>>>> UnknownTopicOrPartitionException.
> > > >>>>>>>>>>>>>>>>>> The handler handles the exceptions per-record.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> - Do we need this handler?  [Motivation and Examples
> > > >>>>>>>> sections]
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> RecordTooLargeException: 1) In transactions, the
> > > >>> producer
> > > >>>>>>>>>>>>> collects
> > > >>>>>>>>>>>>>>>>> multiple
> > > >>>>>>>>>>>>>>>>>> records in batches. Then a RecordTooLargeException
> > > >>> related
> > > >>>>>>>> to a
> > > >>>>>>>>>>>>>>> single
> > > >>>>>>>>>>>>>>>>>> record leads to failing the entire batch. A custom
> > > >>>> exception
> > > >>>>>>>>>>>>>> handler
> > > >>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>> this case may decide on dropping the record and
> > > >>> continuing
> > > >>>>>>>> the
> > > >>>>>>>>>>>>>>>>> processing.
> > > >>>>>>>>>>>>>>>>>> See Example 1, please. 2) More over, in Kafka
> > > >> Streams, a
> > > >>>>>>>> record
> > > >>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>> too
> > > >>>>>>>>>>>>>>>>>> large is a poison pill record, and there is no way to
> > > >>> skip
> > > >>>>>>>> over
> > > >>>>>>>>>>>>>> it. A
> > > >>>>>>>>>>>>>>>>>> handler would allow us to react to this error inside
> > > >> the
> > > >>>>>>>>>>>>> producer,
> > > >>>>>>>>>>>>>>>> i.e.,
> > > >>>>>>>>>>>>>>>>>> local to where the error happens, and thus simplify
> > > >> the
> > > >>>>>>>> overall
> > > >>>>>>>>>>>>>> code
> > > >>>>>>>>>>>>>>>>>> significantly. Please read the Motivation section for
> > > >>> more
> > > >>>>>>>>>>>>>>> explanation.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> UnknownTopicOrPartitionException: For this case, the
> > > >>>>> producer
> > > >>>>>>>>>>>>>> handles
> > > >>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>> exception internally and only issues a WARN log about
> > > >>>>> missing
> > > >>>>>>>>>>>>>>> metadata
> > > >>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> retries internally. Later, when the producer hits "
> > > >>>>>>>>>>>>>>> deliver.timeout.ms"
> > > >>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>> throws a TimeoutException, and the user can only
> > > >> blindly
> > > >>>>>>>> retry,
> > > >>>>>>>>>>>>>> which
> > > >>>>>>>>>>>>>>>> may
> > > >>>>>>>>>>>>>>>>>> result in an infinite retry loop. The thrown
> > > >>>>> TimeoutException
> > > >>>>>>>>>>>>>> "cuts"
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> connection to the underlying root cause of missing
> > > >>>> metadata
> > > >>>>>>>>>>>>> (which
> > > >>>>>>>>>>>>>>>> could
> > > >>>>>>>>>>>>>>>>>> indeed be a transient error but is persistent for a
> > > >>>>>>>>> non-existing
> > > >>>>>>>>>>>>>>>> topic).
> > > >>>>>>>>>>>>>>>>>> Thus, there is no programmatic way to break the
> > > >> infinite
> > > >>>>>>>> retry
> > > >>>>>>>>>>>>>> loop.
> > > >>>>>>>>>>>>>>>>> Kafka
> > > >>>>>>>>>>>>>>>>>> Streams also blindly retries for this case, and the
> > > >>>>>>>> application
> > > >>>>>>>>>>>>>> gets
> > > >>>>>>>>>>>>>>>>> stuck.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> - Having interface vs configuration option:
> > > >> [Motivation,
> > > >>>>>>>>>>>>> Examples,
> > > >>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> Rejected Alternatives sections]
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Our solution is introducing an interface due to the
> > > >> full
> > > >>>>>>>>>>>>>> flexibility
> > > >>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>> it offers. Sometimes users, especially Kafka Streams
> > > >>> ones,
> > > >>>>>>>>>>>>>> determine
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> handler's behaviour based on the situation. For
> > > >>> example, f
> > > >>>>>>>>>>>>>>>>>> acing UnknownTopicOrPartitionException*, *the user may
> > > >>>> want
> > > >>>>>>>> to
> > > >>>>>>>>>>>>>> raise
> > > >>>>>>>>>>>>>>> an
> > > >>>>>>>>>>>>>>>>>> error for some topics but retry it for other topics.
> > > >>>> Having
> > > >>>>> a
> > > >>>>>>>>>>>>>>>>> configuration
> > > >>>>>>>>>>>>>>>>>> option with a fixed set of possibilities does not
> > > >> serve
> > > >>>> the
> > > >>>>>>>>>>>>> user's
> > > >>>>>>>>>>>>>>>>>> needs. See Example 2, please.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> - Note on RecordTooLargeException: [Public Interfaces
> > > >>>>>>>> section]
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> If the custom handler decides on SWALLOW for
> > > >>>>>>>>>>>>>> RecordTooLargeException,
> > > >>>>>>>>>>>>>>>>> then
> > > >>>>>>>>>>>>>>>>>> this record will not be a part of the batch of
> > > >>>> transactions
> > > >>>>>>>> and
> > > >>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>> also
> > > >>>>>>>>>>>>>>>>>> not be sent to the broker in non-transactional mode.
> > > >> So
> > > >>> no
> > > >>>>>>>>>>>>> worries
> > > >>>>>>>>>>>>>>>> about
> > > >>>>>>>>>>>>>>>>>> getting a RecordTooLargeException from the broker in
> > > >>> this
> > > >>>>>>>> case,
> > > >>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> record will never ever be sent to the broker. SWALLOW
> > > >>>> means
> > > >>>>>>>>> drop
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> record
> > > >>>>>>>>>>>>>>>>>> and continue/swallow the error.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> - What if the handle() method implements RETRY for
> > > >>>>>>>>>>>>>>>>> RecordTooLargeException?
> > > >>>>>>>>>>>>>>>>>> [Proposed Changes section]
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> We have to limit the user to only have FAIL or SWALLOW
> > > >>> for
> > > >>>>>>>>>>>>>>>>>> RecordTooLargeException. Actually, RETRY must be equal
> > > >>> to
> > > >>>>>>>> FAIL.
> > > >>>>>>>>>>>>>> This
> > > >>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>> well documented/informed in javadoc.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> - What if the handle() method of the handler throws an
> > > >>>>>>>>> exception?
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> The handler is expected to have correct code. If it
> > > >>> throws
> > > >>>>> an
> > > >>>>>>>>>>>>>>>> exception,
> > > >>>>>>>>>>>>>>>>>> everything fails.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> This is a PoC PR <
> > > >>>>> https://github.com/apache/kafka/pull/15846
> > > >>>>>>>>>
> > > >>>>>>>>>>>>> ONLY
> > > >>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>> RecordTooLargeException. The code changes related to
> > > >>>>>>>>>>>>>>>>>> UnknownTopicOrPartitionException will be added to this
> > > >>> PR
> > > >>>>>>>>> LATER.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Looking forward to your feedback again.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Alieh
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> On Thu, Apr 25, 2024 at 11:46 PM Kirk True <
> > > >>>>>>>> k...@kirktrue.pro>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Hi Alieh,
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Thanks for the updates!
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Comments inline...
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> On Apr 25, 2024, at 1:10 PM, Alieh Saeedi
> > > >>>>>>>>>>>>>>>>> <asae...@confluent.io.INVALID
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Hi all,
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Thanks a lot for the constructive feedbacks!
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Addressing some of the main concerns:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> - The `RecordTooLargeException` can be thrown by
> > > >>> broker,
> > > >>>>>>>>>>>>>> producer
> > > >>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>> consumer. Of course, the `ProducerExceptionHandler`
> > > >>>>>>>> interface
> > > >>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>> introduced
> > > >>>>>>>>>>>>>>>>>>>> to affect only the exceptions thrown from the
> > > >>> producer.
> > > >>>>>>>> This
> > > >>>>>>>>>>>>>> KIP
> > > >>>>>>>>>>>>>>>> very
> > > >>>>>>>>>>>>>>>>>>>> specifically means to provide a possibility to
> > > >> manage
> > > >>>> the
> > > >>>>>>>>>>>>>>>>>>>> `RecordTooLargeException` thrown from the
> > > >>>> Producer.send()
> > > >>>>>>>>>>>>>> method.
> > > >>>>>>>>>>>>>>>>>> Please
> > > >>>>>>>>>>>>>>>>>>>> see “Proposed Changes” section for more clarity. I
> > > >>>>>>>>>>>>> investigated
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> issue
> > > >>>>>>>>>>>>>>>>>>>> there thoroughly. I hope it can explain the concern
> > > >>>> about
> > > >>>>>>>> how
> > > >>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>> handle
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> errors as well.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> - The problem with Callback: Methods of Callback are
> > > >>>>> called
> > > >>>>>>>>>>>>>> when
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> record
> > > >>>>>>>>>>>>>>>>>>>> sent to the server is acknowledged, while this is
> > > >> not
> > > >>>> the
> > > >>>>>>>>>>>>>> desired
> > > >>>>>>>>>>>>>>>>> time
> > > >>>>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>> all exceptions. We intend to handle exceptions
> > > >>>> beforehand.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> I guess it makes sense to keep the expectation for
> > > >> when
> > > >>>>>>>>>>>>> Callback
> > > >>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>> invoked as-is vs. shoehorning more into it.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> - What if the custom handler returns RETRY for
> > > >>>>>>>>>>>>>>>>>>> `RecordTooLargeException`? I
> > > >>>>>>>>>>>>>>>>>>>> assume changing the producer configuration at
> > > >> runtime
> > > >>> is
> > > >>>>>>>>>>>>>>> possible.
> > > >>>>>>>>>>>>>>>> If
> > > >>>>>>>>>>>>>>>>>> so,
> > > >>>>>>>>>>>>>>>>>>>> RETRY for a too large record is valid because maybe
> > > >> in
> > > >>>> the
> > > >>>>>>>>>>>>> next
> > > >>>>>>>>>>>>>>>> try,
> > > >>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> too large record is not poisoning any more. I am not
> > > >>>> 100%
> > > >>>>>>>>>>>>> sure
> > > >>>>>>>>>>>>>>>> about
> > > >>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> technical details, though. Otherwise, we can
> > > >> consider
> > > >>>> the
> > > >>>>>>>>>>>>> RETRY
> > > >>>>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>>>> FAIL
> > > >>>>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>> this exception. Another solution would be to
> > > >> consider
> > > >>> a
> > > >>>>>>>>>>>>>> constant
> > > >>>>>>>>>>>>>>>>> number
> > > >>>>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>> times for RETRY which can be useful for other
> > > >>> exceptions
> > > >>>>> as
> > > >>>>>>>>>>>>>> well.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> It’s not presently possible to change the
> > > >> configuration
> > > >>>> of
> > > >>>>>>>> an
> > > >>>>>>>>>>>>>>>> existing
> > > >>>>>>>>>>>>>>>>>>> Producer at runtime. So if a record hits a
> > > >>>>>>>>>>>>>> RecordTooLargeException
> > > >>>>>>>>>>>>>>>>> once,
> > > >>>>>>>>>>>>>>>>>> no
> > > >>>>>>>>>>>>>>>>>>> amount of retrying (with the current Producer) will
> > > >>>> change
> > > >>>>>>>>> that
> > > >>>>>>>>>>>>>>> fact.
> > > >>>>>>>>>>>>>>>>> So
> > > >>>>>>>>>>>>>>>>>>> I’m still a little stuck on how to handle a response
> > > >> of
> > > >>>>>>>> RETRY
> > > >>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>> an
> > > >>>>>>>>>>>>>>>>>>> “oversized” record.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> - What if the handle() method itself throws an
> > > >>>> exception?
> > > >>>>> I
> > > >>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>>> rationally and pragmatically, the behaviour must be
> > > >>>>> exactly
> > > >>>>>>>>>>>>>> like
> > > >>>>>>>>>>>>>>>> when
> > > >>>>>>>>>>>>>>>>>> no
> > > >>>>>>>>>>>>>>>>>>>> custom handler is defined since the user actually
> > > >> did
> > > >>>> not
> > > >>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>> working
> > > >>>>>>>>>>>>>>>>>>>> handler.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> I’m not convinced that ignoring an errant handler is
> > > >>> the
> > > >>>>>>>> right
> > > >>>>>>>>>>>>>>>> choice.
> > > >>>>>>>>>>>>>>>>> It
> > > >>>>>>>>>>>>>>>>>>> then becomes a silent failure that might have
> > > >>>>> repercussions,
> > > >>>>>>>>>>>>>>>> depending
> > > >>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>> the business logic. A user would have to proactively
> > > >>>> trawls
> > > >>>>>>>>>>>>>> through
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> logs for WARN/ERROR messages to catch it.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Throwing a hard error is pretty draconian, though…
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> - Why not use config parameters instead of an
> > > >>> interface?
> > > >>>>> As
> > > >>>>>>>>>>>>>>>> explained
> > > >>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>> the “Rejected Alternatives” section, we assume that
> > > >>> the
> > > >>>>>>>>>>>>> handler
> > > >>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>> used for a greater number of exceptions in the
> > > >> future.
> > > >>>>>>>>>>>>>> Defining a
> > > >>>>>>>>>>>>>>>>>>>> configuration parameter for each exception may make
> > > >>> the
> > > >>>>>>>>>>>>>>>>> configuration a
> > > >>>>>>>>>>>>>>>>>>> bit
> > > >>>>>>>>>>>>>>>>>>>> messy. Moreover, the handler offers more
> > > >> flexibility.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Agreed that the logic-via-configuration approach is
> > > >>> weird
> > > >>>>>>>> and
> > > >>>>>>>>>>>>>>>> limiting.
> > > >>>>>>>>>>>>>>>>>>> Forget I ever suggested it ;)
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> I’d think additional background in the Motivation
> > > >>> section
> > > >>>>>>>>> would
> > > >>>>>>>>>>>>>>> help
> > > >>>>>>>>>>>>>>>> me
> > > >>>>>>>>>>>>>>>>>>> understand how users might use this feature beyond a)
> > > >>>>>>>> skipping
> > > >>>>>>>>>>>>>>>>>> “oversized”
> > > >>>>>>>>>>>>>>>>>>> records, and b) not retrying missing topics.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Small change:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> -ProductionExceptionHandlerResponse -> Response for
> > > >>>>> brevity
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>> simplicity.
> > > >>>>>>>>>>>>>>>>>>>> Could’ve been HandlerResponse too I think!
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> The name change sounds good to me.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Thanks Alieh!
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> I thank you all again for your useful
> > > >>>>>>>> questions/suggestions.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> I would be happy to hear more of your concerns, as
> > > >>>> stated
> > > >>>>>>>> in
> > > >>>>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>>>>>> feedback.
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>>>>>>> Alieh
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> On Wed, Apr 24, 2024 at 12:31 AM Justine Olshan
> > > >>>>>>>>>>>>>>>>>>>> <jols...@confluent.io.invalid> wrote:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Thanks Alieh for the updates.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> I'm a little concerned about the design pattern
> > > >> here.
> > > >>>> It
> > > >>>>>>>>>>>>> seems
> > > >>>>>>>>>>>>>>>> like
> > > >>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>> want
> > > >>>>>>>>>>>>>>>>>>>>> specific usages, but we are packaging it as a
> > > >> generic
> > > >>>>>>>>>>>>> handler.
> > > >>>>>>>>>>>>>>>>>>>>> I think we tried to narrow down on the specific
> > > >>> errors
> > > >>>> we
> > > >>>>>>>>>>>>> want
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>> handle,
> > > >>>>>>>>>>>>>>>>>>>>> but it feels a little clunky as we have a generic
> > > >>> thing
> > > >>>>>>>> for
> > > >>>>>>>>>>>>>> two
> > > >>>>>>>>>>>>>>>>>> specific
> > > >>>>>>>>>>>>>>>>>>>>> errors.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> I'm wondering if we are using the right patterns to
> > > >>>> solve
> > > >>>>>>>>>>>>>> these
> > > >>>>>>>>>>>>>>>>>>> problems. I
> > > >>>>>>>>>>>>>>>>>>>>> agree though that we will need something more than
> > > >>> the
> > > >>>>>>>> error
> > > >>>>>>>>>>>>>>>> classes
> > > >>>>>>>>>>>>>>>>>> I'm
> > > >>>>>>>>>>>>>>>>>>>>> proposing if we want to have different handling be
> > > >>>>>>>>>>>>>> configurable.
> > > >>>>>>>>>>>>>>>>>>>>> My concern is that the open-endedness of a handler
> > > >>>> means
> > > >>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>>>>>> creating more problems than we are solving. It is
> > > >>> still
> > > >>>>>>>>>>>>>> unclear
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> me
> > > >>>>>>>>>>>>>>>>>>> how
> > > >>>>>>>>>>>>>>>>>>>>> we expect to handle the errors. Perhaps we could
> > > >>>> include
> > > >>>>>>>> an
> > > >>>>>>>>>>>>>>>> example?
> > > >>>>>>>>>>>>>>>>>> It
> > > >>>>>>>>>>>>>>>>>>>>> seems like there is a specific use case in mind and
> > > >>>> maybe
> > > >>>>>>>> we
> > > >>>>>>>>>>>>>> can
> > >
> > >
> > >
> >

Re: [DISCUSS] KIP-1038: Add Custom Error Handler to Producer

Reply via email to