I think some missing context here (which can maybe be added in the Motivation section as background) is that the deserialization is actually done within Streams, not within the Consumer. Since the consumers in Kafka Streams might be subscribed to multiple topics with different data types, it has to use a ByteArrayDeserializer so that the consumer just hands back the plain bytes and then Kafka Streams can do all the deserialization based on its knowledge of the topic->serde mapping.
Streams will just store the set of records returned by each #poll as plain bytes, and only when they are dequeued does it actually attempt to deserialize the record. So there's only one record being deserialized at a time, and each one can be handled separately by the deserialization exception handler. Which is what KIP-334 relies on. I do think it's worth saying something like that in the KIP somewhere, since it's somewhat obscure knowledge of Kafka Streams internals that not everyone will immediately know about. Feel free to just copy what I wrote or write something more succinct 🙂 By the way, there are some minor formatting typos in the code snippet for the KIP. Not a big deal, but if you're editing the KIP anyways you might as well fix that On Tue, Apr 16, 2024 at 1:33 AM Frédérik Rouleau <froul...@confluent.io.invalid> wrote: > Hi Almog, > > I think you do not understand the behavior that was introduced with the > KIP-334. > When you have a DeserializationException, if you set the proper seek call > to skip the faulty record, the next poll call will return the remaining > records to process and not a new list of records. When the KIP was > released, I made a demo project > https://github.com/fred-ro/kafka-poison-pills not sure it's still working, > I should spend time maintaining it. The only issue with the initial KIP is > that you do not have access to the data of the faulty record which makes > DLQ implementation quite difficult. > > Regards, > Fred >