I think some missing context here (which can maybe be added in the
Motivation section as background) is that the deserialization is actually
done within Streams, not within the Consumer. Since the consumers
in Kafka Streams might be subscribed to multiple topics with different
data types, it has to use a ByteArrayDeserializer so that the consumer
just hands back the plain bytes and then Kafka Streams can do all the
deserialization based on its knowledge of the topic->serde mapping.

Streams will just store the set of records returned by each #poll as plain
bytes, and only when they are dequeued does it actually attempt to
deserialize the record. So there's only one record being deserialized at
a time, and each one can be handled separately by the deserialization
exception handler. Which is what KIP-334 relies on.

I do think it's worth saying something like that in the KIP somewhere,
since it's somewhat obscure knowledge of Kafka Streams internals
that not everyone will immediately know about. Feel free to just copy
what I wrote or write something more succinct 🙂

By the way, there are some minor formatting typos in the code snippet
for the KIP. Not a big deal, but if you're editing the KIP anyways you
might as well fix that

On Tue, Apr 16, 2024 at 1:33 AM Frédérik Rouleau
<froul...@confluent.io.invalid> wrote:

> Hi Almog,
>
> I think you do not understand the behavior that was introduced with the
> KIP-334.
> When you have a DeserializationException, if you set the proper seek call
> to skip the faulty record, the next poll call will return the remaining
> records to process and not a new list of records. When the KIP was
> released, I made a demo project
> https://github.com/fred-ro/kafka-poison-pills not sure it's still working,
> I should spend time maintaining it. The only issue with the initial KIP is
> that you do not have access to the data of the faulty record which makes
> DLQ implementation quite difficult.
>
> Regards,
> Fred
>

Reply via email to