Stanislav,

Any updates on this KIP? We have internal users who want to skip the
corrupted message while consuming the records.


On Fri, Oct 19, 2018 at 11:34 PM Matthias J. Sax <matth...@confluent.io>
wrote:

> I am not 100% familiar with the details of the consumer code, however I
> tend to disagree with:
>
> > There's no difference between the two cases -- if (and only if) the
> message is corrupt, it can't be deserialized.  If (and only if) it can't be
> deserialized, it is corrupt.
>
> Assume that a user configures a JSON deserializer but a faulty upstream
> producer writes an Avro message. For this case, the message is not
> corrupted, but still can't be deserialized. And I can imaging that users
> want to handle both cases differently.
>
> Thus, I think it makes sense to have two different exceptions
> `RecordDeserializationException` and `CorruptedRecordException` that can
> both extend `FaultyRecordException` (don't like this name too much
> honestly, but don't have a better idea for it anyway).
>
> Side remark. If we introduce class `RecordDeserializationException` and
> `CorruptedRecordException`, we can also add an interface that both
> implement to return partition/offset information and let both extend
> `SerializationException` directly without an intermediate class in the
> exception hierarchy.
>
>
> -Matthias
>
> On 8/8/18 2:57 AM, Stanislav Kozlovski wrote:
> >> If you are inheriting from SerializationException, your derived class
> > should also be a kind of serialization exception.  Not something more
> > general.
> > Yeah, the reason for inheriting it would be for backwards-compatibility.
> >
> >> Hmm.  Can you think of any new scenarios that would make Kafka force the
> > user need to skip a specific record?  Perhaps one scenario is if records
> > are lost but we don't know how many.
> > Not on the spot, but I do wonder how likely a new scenario is to surface
> in
> > the future and how we'd handle the exceptions' class hierarchy then.
> >
> >> Which offset were we planning to use in the
> > exception?
> > The offset of the record which caused the exception. In the case of
> > batches, we use the last offset of the batch. In both cases, users should
> > have to seek +1 from the given offset. You can review the PR to ensure
> its
> > accurate
> >
> >
> > If both of you prefer `RecordDeserializationException`, we can go with
> > that. Please do confirm that is okay
> >
> > On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson <ja...@confluent.io>
> wrote:
> >
> >> One difference between the two cases is that we can't generally trust
> the
> >> offset of a corrupt message. Which offset were we planning to use in the
> >> exception? Maybe it should be either the fetch offset or one plus the
> last
> >> consumed offset? I think I'm with Colin in preferring
> >> RecordDeserializationException for both cases if possible. For one
> thing,
> >> that makes the behavior consistent whether or not `check.crs` is
> enabled.
> >>
> >> -Jason
> >>
> >> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe <cmcc...@apache.org>
> wrote:
> >>
> >>> Hi Stanislav,
> >>>
> >>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote:
> >>>> Hey Colin,
> >>>>
> >>>> It may be a bit vague but keep in mind we also raise the exception in
> >> the
> >>>> case where the record is corrupted.
> >>>> We discussed with Jason offline that message corruption most often
> >>> prevents
> >>>> deserialization itself and that may be enough of an argument to raise
> >>>> `RecordDeserializationException` in the case of a corrupt record. I
> >>>> personally find that misleading.
> >>>
> >>> Hmm.  I think that by definition, corrupt records are records that
> can't
> >>> be deserialized.  There's no difference between the two cases -- if
> (and
> >>> only if) the message is corrupt, it can't be deserialized.  If (and
> only
> >>> if) it can't be deserialized, it is corrupt.
> >>>
> >>>>
> >>>> In the end, I think it might be worth it to have a bit of a
> >>>> wider-encompassing `FaultyRecordException` (or even
> >>>> `UnconsumableRecordException`) which would allow users to skip over
> the
> >>>> record.
> >>>
> >>> If you are inheriting from SerializationException, your derived class
> >>> should also be a kind of serialization exception.  Not something more
> >>> general.
> >>>
> >>>> We could then potentially have more specific exceptions (e.g
> >>>> deserialization) inherit that but I'm not sure if we should.
> >>>> A case for a more general exception which provides access to the
> >>>> partition/offset is future backwards-compatibility. If there is ever a
> >>> new
> >>>> scenario that would make the user need to skip a specific record -
> >> there
> >>>> would already be such an exception and this will provide some
> >>>> backward-compatibility with older clients.
> >>>
> >>> Hmm.  Can you think of any new scenarios that would make Kafka force
> the
> >>> user need to skip a specific record?  Perhaps one scenario is if
> records
> >>> are lost but we don't know how many.
> >>>
> >>> If we really want to have something more general, perhaps we need
> >>> something like LostDataException.  But in that case, we can't inherit
> >> from
> >>> SerializationException, since lost data is more general than
> >> serialization
> >>> issues.
> >>>
> >>> best,
> >>> Colin
> >>>
> >>>
> >>>>
> >>>> Best,
> >>>> Stanislav
> >>>>
> >>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe <cmcc...@apache.org>
> >> wrote:
> >>>>
> >>>>> Hi Stanislav,
> >>>>>
> >>>>> Thanks for the KIP.
> >>>>>
> >>>>> As as user, "FaultyRecordException" seems a little vague.  What's
> >>> faulty
> >>>>> about it?  Is it too big?  Does it not match the schema of the other
> >>>>> records?  Was it not found?
> >>>>>
> >>>>> Of course, we know that the specific problem is with [de]serializing
> >>> the
> >>>>> record, not with any of those those things.  So we should choose a
> >> name
> >>>>> that reflects that serialization is the problem.  How about
> >>>>> RecordSerializationException?
> >>>>>
> >>>>> best,
> >>>>> Colin
> >>>>>
> >>>>>
> >>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote:
> >>>>>> Hi Jason and Ted,
> >>>>>>
> >>>>>> @Jason
> >>>>>> I did not oppose the idea as I'm not too familiar with Java
> >>> conventions.
> >>>>> I
> >>>>>> agree it is a non-ideal way for the user to catch the exception so
> >> I
> >>>>>> changed it back.
> >>>>>>
> >>>>>> I've updated the KIP and PR with the latest changes. Now, there is
> >>> only
> >>>>> one
> >>>>>> public exception `FaultyRecordException` which is thrown to the
> >> user
> >>> in
> >>>>>> both scenarios (corrupt record and deserialization error).
> >>>>>> Please take a look again.
> >>>>>>
> >>>>>> Best,
> >>>>>> Stanislav
> >>>>>>
> >>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io
> >>>
> >>>>> wrote:
> >>>>>>
> >>>>>>> Hey Stanislav,
> >>>>>>>
> >>>>>>> I should have noticed it earlier from your example, but I just
> >>> realized
> >>>>>>> that interfaces don't mix well with exceptions. There is no way
> >> to
> >>>>> catch
> >>>>>>> the interface type, which means you have to depend on instanceof
> >>>>> checks,
> >>>>>>> which is not very conventional. At the moment, we raise
> >>>>>>> SerializationException if there is a problem parsing the error,
> >>> and we
> >>>>>>> raise KafkaException if the record fails its CRC check. Since
> >>>>>>> SerializationException extends KafkaExeption, it seems like we
> >> can
> >>>>> handle
> >>>>>>> both cases in a compatible way by handling both cases with a
> >> single
> >>>>> type
> >>>>>>> that extends SerializationException. Maybe something like
> >>>>>>> RecordDeserializationException?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Jason
> >>>>>>>
> >>>>>>> On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu <yuzhih...@gmail.com>
> >>> wrote:
> >>>>>>>
> >>>>>>>> +1
> >>>>>>>> -------- Original message --------From: Stanislav Kozlovski <
> >>>>>>>> stanis...@confluent.io> Date: 8/2/18  2:41 AM  (GMT-08:00) To:
> >>>>>>>> dev@kafka.apache.org Subject: [VOTE] KIP-334 Include
> >> partitions
> >>> in
> >>>>>>>> exceptions raised during consumer record
> >>> deserialization/validation
> >>>>>>>> Hey everybody,
> >>>>>>>>
> >>>>>>>> I'd like to start a vote thread for KIP-334 Include partitions
> >> in
> >>>>>>>> exceptions raised during consumer record
> >>> deserialization/validation
> >>>>>>>> <
> >>>>>>>
> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?
> >>> pageId=87297793
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Best,
> >>>>>>>> Stanislav
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best,
> >>>>>> Stanislav
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best,
> >>>> Stanislav
> >>>
> >>
> >
> >
>
>

Reply via email to