Re: [VOTE] KIP-334 Include partitions in exceptions raised during consumer record deserialization/validation

Kamal Chandraprakash Sun, 05 May 2019 22:41:07 -0700

Matthias,

We already have CorruptRecordException which doesn't extend the
SerializationException. So, we need an alternate
name suggestion for the corrupted record error if we decide to extend the
FaultyRecordException class.


Stanislav,

Our users are also facing this error. Could we bump up this discussion?

I think we can have a single exception type
FaultyRecordException/RecordDeserialization exception to capture both
the errors. We can add an additional enum field to differentiate the errors
if required.

Thanks,
Kamal Chandraprakash

On Wed, Apr 24, 2019 at 1:49 PM Kamal Chandraprakash <
[email protected]> wrote:

> Stanislav,
>
> Any updates on this KIP? We have internal users who want to skip the
> corrupted message while consuming the records.
>
>
> On Fri, Oct 19, 2018 at 11:34 PM Matthias J. Sax <[email protected]>
> wrote:
>
>> I am not 100% familiar with the details of the consumer code, however I
>> tend to disagree with:
>>
>> > There's no difference between the two cases -- if (and only if) the
>> message is corrupt, it can't be deserialized.  If (and only if) it can't be
>> deserialized, it is corrupt.
>>
>> Assume that a user configures a JSON deserializer but a faulty upstream
>> producer writes an Avro message. For this case, the message is not
>> corrupted, but still can't be deserialized. And I can imaging that users
>> want to handle both cases differently.
>>
>> Thus, I think it makes sense to have two different exceptions
>> `RecordDeserializationException` and `CorruptedRecordException` that can
>> both extend `FaultyRecordException` (don't like this name too much
>> honestly, but don't have a better idea for it anyway).
>>
>> Side remark. If we introduce class `RecordDeserializationException` and
>> `CorruptedRecordException`, we can also add an interface that both
>> implement to return partition/offset information and let both extend
>> `SerializationException` directly without an intermediate class in the
>> exception hierarchy.
>>
>>
>> -Matthias
>>
>> On 8/8/18 2:57 AM, Stanislav Kozlovski wrote:
>> >> If you are inheriting from SerializationException, your derived class
>> > should also be a kind of serialization exception.  Not something more
>> > general.
>> > Yeah, the reason for inheriting it would be for backwards-compatibility.
>> >
>> >> Hmm.  Can you think of any new scenarios that would make Kafka force
>> the
>> > user need to skip a specific record?  Perhaps one scenario is if records
>> > are lost but we don't know how many.
>> > Not on the spot, but I do wonder how likely a new scenario is to
>> surface in
>> > the future and how we'd handle the exceptions' class hierarchy then.
>> >
>> >> Which offset were we planning to use in the
>> > exception?
>> > The offset of the record which caused the exception. In the case of
>> > batches, we use the last offset of the batch. In both cases, users
>> should
>> > have to seek +1 from the given offset. You can review the PR to ensure
>> its
>> > accurate
>> >
>> >
>> > If both of you prefer `RecordDeserializationException`, we can go with
>> > that. Please do confirm that is okay
>> >
>> > On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson <[email protected]>
>> wrote:
>> >
>> >> One difference between the two cases is that we can't generally trust
>> the
>> >> offset of a corrupt message. Which offset were we planning to use in
>> the
>> >> exception? Maybe it should be either the fetch offset or one plus the
>> last
>> >> consumed offset? I think I'm with Colin in preferring
>> >> RecordDeserializationException for both cases if possible. For one
>> thing,
>> >> that makes the behavior consistent whether or not `check.crs` is
>> enabled.
>> >>
>> >> -Jason
>> >>
>> >> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe <[email protected]>
>> wrote:
>> >>
>> >>> Hi Stanislav,
>> >>>
>> >>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote:
>> >>>> Hey Colin,
>> >>>>
>> >>>> It may be a bit vague but keep in mind we also raise the exception in
>> >> the
>> >>>> case where the record is corrupted.
>> >>>> We discussed with Jason offline that message corruption most often
>> >>> prevents
>> >>>> deserialization itself and that may be enough of an argument to raise
>> >>>> `RecordDeserializationException` in the case of a corrupt record. I
>> >>>> personally find that misleading.
>> >>>
>> >>> Hmm.  I think that by definition, corrupt records are records that
>> can't
>> >>> be deserialized.  There's no difference between the two cases -- if
>> (and
>> >>> only if) the message is corrupt, it can't be deserialized.  If (and
>> only
>> >>> if) it can't be deserialized, it is corrupt.
>> >>>
>> >>>>
>> >>>> In the end, I think it might be worth it to have a bit of a
>> >>>> wider-encompassing `FaultyRecordException` (or even
>> >>>> `UnconsumableRecordException`) which would allow users to skip over
>> the
>> >>>> record.
>> >>>
>> >>> If you are inheriting from SerializationException, your derived class
>> >>> should also be a kind of serialization exception.  Not something more
>> >>> general.
>> >>>
>> >>>> We could then potentially have more specific exceptions (e.g
>> >>>> deserialization) inherit that but I'm not sure if we should.
>> >>>> A case for a more general exception which provides access to the
>> >>>> partition/offset is future backwards-compatibility. If there is ever
>> a
>> >>> new
>> >>>> scenario that would make the user need to skip a specific record -
>> >> there
>> >>>> would already be such an exception and this will provide some
>> >>>> backward-compatibility with older clients.
>> >>>
>> >>> Hmm.  Can you think of any new scenarios that would make Kafka force
>> the
>> >>> user need to skip a specific record?  Perhaps one scenario is if
>> records
>> >>> are lost but we don't know how many.
>> >>>
>> >>> If we really want to have something more general, perhaps we need
>> >>> something like LostDataException.  But in that case, we can't inherit
>> >> from
>> >>> SerializationException, since lost data is more general than
>> >> serialization
>> >>> issues.
>> >>>
>> >>> best,
>> >>> Colin
>> >>>
>> >>>
>> >>>>
>> >>>> Best,
>> >>>> Stanislav
>> >>>>
>> >>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe <[email protected]>
>> >> wrote:
>> >>>>
>> >>>>> Hi Stanislav,
>> >>>>>
>> >>>>> Thanks for the KIP.
>> >>>>>
>> >>>>> As as user, "FaultyRecordException" seems a little vague.  What's
>> >>> faulty
>> >>>>> about it?  Is it too big?  Does it not match the schema of the other
>> >>>>> records?  Was it not found?
>> >>>>>
>> >>>>> Of course, we know that the specific problem is with [de]serializing
>> >>> the
>> >>>>> record, not with any of those those things.  So we should choose a
>> >> name
>> >>>>> that reflects that serialization is the problem.  How about
>> >>>>> RecordSerializationException?
>> >>>>>
>> >>>>> best,
>> >>>>> Colin
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote:
>> >>>>>> Hi Jason and Ted,
>> >>>>>>
>> >>>>>> @Jason
>> >>>>>> I did not oppose the idea as I'm not too familiar with Java
>> >>> conventions.
>> >>>>> I
>> >>>>>> agree it is a non-ideal way for the user to catch the exception so
>> >> I
>> >>>>>> changed it back.
>> >>>>>>
>> >>>>>> I've updated the KIP and PR with the latest changes. Now, there is
>> >>> only
>> >>>>> one
>> >>>>>> public exception `FaultyRecordException` which is thrown to the
>> >> user
>> >>> in
>> >>>>>> both scenarios (corrupt record and deserialization error).
>> >>>>>> Please take a look again.
>> >>>>>>
>> >>>>>> Best,
>> >>>>>> Stanislav
>> >>>>>>
>> >>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson <[email protected]
>> >>>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>>> Hey Stanislav,
>> >>>>>>>
>> >>>>>>> I should have noticed it earlier from your example, but I just
>> >>> realized
>> >>>>>>> that interfaces don't mix well with exceptions. There is no way
>> >> to
>> >>>>> catch
>> >>>>>>> the interface type, which means you have to depend on instanceof
>> >>>>> checks,
>> >>>>>>> which is not very conventional. At the moment, we raise
>> >>>>>>> SerializationException if there is a problem parsing the error,
>> >>> and we
>> >>>>>>> raise KafkaException if the record fails its CRC check. Since
>> >>>>>>> SerializationException extends KafkaExeption, it seems like we
>> >> can
>> >>>>> handle
>> >>>>>>> both cases in a compatible way by handling both cases with a
>> >> single
>> >>>>> type
>> >>>>>>> that extends SerializationException. Maybe something like
>> >>>>>>> RecordDeserializationException?
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> Jason
>> >>>>>>>
>> >>>>>>> On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu <[email protected]>
>> >>> wrote:
>> >>>>>>>
>> >>>>>>>> +1
>> >>>>>>>> -------- Original message --------From: Stanislav Kozlovski <
>> >>>>>>>> [email protected]> Date: 8/2/18  2:41 AM  (GMT-08:00) To:
>> >>>>>>>> [email protected] Subject: [VOTE] KIP-334 Include
>> >> partitions
>> >>> in
>> >>>>>>>> exceptions raised during consumer record
>> >>> deserialization/validation
>> >>>>>>>> Hey everybody,
>> >>>>>>>>
>> >>>>>>>> I'd like to start a vote thread for KIP-334 Include partitions
>> >> in
>> >>>>>>>> exceptions raised during consumer record
>> >>> deserialization/validation
>> >>>>>>>> <
>> >>>>>>>
>> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?
>> >>> pageId=87297793
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> Best,
>> >>>>>>>> Stanislav
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Best,
>> >>>>>> Stanislav
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Best,
>> >>>> Stanislav
>> >>>
>> >>
>> >
>> >
>>
>>

Re: [VOTE] KIP-334 Include partitions in exceptions raised during consumer record deserialization/validation

Reply via email to