Stanislav, Any updates on this KIP? We have internal users who want to skip the corrupted message while consuming the records.
On Fri, Oct 19, 2018 at 11:34 PM Matthias J. Sax <matth...@confluent.io> wrote: > I am not 100% familiar with the details of the consumer code, however I > tend to disagree with: > > > There's no difference between the two cases -- if (and only if) the > message is corrupt, it can't be deserialized. If (and only if) it can't be > deserialized, it is corrupt. > > Assume that a user configures a JSON deserializer but a faulty upstream > producer writes an Avro message. For this case, the message is not > corrupted, but still can't be deserialized. And I can imaging that users > want to handle both cases differently. > > Thus, I think it makes sense to have two different exceptions > `RecordDeserializationException` and `CorruptedRecordException` that can > both extend `FaultyRecordException` (don't like this name too much > honestly, but don't have a better idea for it anyway). > > Side remark. If we introduce class `RecordDeserializationException` and > `CorruptedRecordException`, we can also add an interface that both > implement to return partition/offset information and let both extend > `SerializationException` directly without an intermediate class in the > exception hierarchy. > > > -Matthias > > On 8/8/18 2:57 AM, Stanislav Kozlovski wrote: > >> If you are inheriting from SerializationException, your derived class > > should also be a kind of serialization exception. Not something more > > general. > > Yeah, the reason for inheriting it would be for backwards-compatibility. > > > >> Hmm. Can you think of any new scenarios that would make Kafka force the > > user need to skip a specific record? Perhaps one scenario is if records > > are lost but we don't know how many. > > Not on the spot, but I do wonder how likely a new scenario is to surface > in > > the future and how we'd handle the exceptions' class hierarchy then. > > > >> Which offset were we planning to use in the > > exception? > > The offset of the record which caused the exception. In the case of > > batches, we use the last offset of the batch. In both cases, users should > > have to seek +1 from the given offset. You can review the PR to ensure > its > > accurate > > > > > > If both of you prefer `RecordDeserializationException`, we can go with > > that. Please do confirm that is okay > > > > On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson <ja...@confluent.io> > wrote: > > > >> One difference between the two cases is that we can't generally trust > the > >> offset of a corrupt message. Which offset were we planning to use in the > >> exception? Maybe it should be either the fetch offset or one plus the > last > >> consumed offset? I think I'm with Colin in preferring > >> RecordDeserializationException for both cases if possible. For one > thing, > >> that makes the behavior consistent whether or not `check.crs` is > enabled. > >> > >> -Jason > >> > >> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe <cmcc...@apache.org> > wrote: > >> > >>> Hi Stanislav, > >>> > >>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote: > >>>> Hey Colin, > >>>> > >>>> It may be a bit vague but keep in mind we also raise the exception in > >> the > >>>> case where the record is corrupted. > >>>> We discussed with Jason offline that message corruption most often > >>> prevents > >>>> deserialization itself and that may be enough of an argument to raise > >>>> `RecordDeserializationException` in the case of a corrupt record. I > >>>> personally find that misleading. > >>> > >>> Hmm. I think that by definition, corrupt records are records that > can't > >>> be deserialized. There's no difference between the two cases -- if > (and > >>> only if) the message is corrupt, it can't be deserialized. If (and > only > >>> if) it can't be deserialized, it is corrupt. > >>> > >>>> > >>>> In the end, I think it might be worth it to have a bit of a > >>>> wider-encompassing `FaultyRecordException` (or even > >>>> `UnconsumableRecordException`) which would allow users to skip over > the > >>>> record. > >>> > >>> If you are inheriting from SerializationException, your derived class > >>> should also be a kind of serialization exception. Not something more > >>> general. > >>> > >>>> We could then potentially have more specific exceptions (e.g > >>>> deserialization) inherit that but I'm not sure if we should. > >>>> A case for a more general exception which provides access to the > >>>> partition/offset is future backwards-compatibility. If there is ever a > >>> new > >>>> scenario that would make the user need to skip a specific record - > >> there > >>>> would already be such an exception and this will provide some > >>>> backward-compatibility with older clients. > >>> > >>> Hmm. Can you think of any new scenarios that would make Kafka force > the > >>> user need to skip a specific record? Perhaps one scenario is if > records > >>> are lost but we don't know how many. > >>> > >>> If we really want to have something more general, perhaps we need > >>> something like LostDataException. But in that case, we can't inherit > >> from > >>> SerializationException, since lost data is more general than > >> serialization > >>> issues. > >>> > >>> best, > >>> Colin > >>> > >>> > >>>> > >>>> Best, > >>>> Stanislav > >>>> > >>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe <cmcc...@apache.org> > >> wrote: > >>>> > >>>>> Hi Stanislav, > >>>>> > >>>>> Thanks for the KIP. > >>>>> > >>>>> As as user, "FaultyRecordException" seems a little vague. What's > >>> faulty > >>>>> about it? Is it too big? Does it not match the schema of the other > >>>>> records? Was it not found? > >>>>> > >>>>> Of course, we know that the specific problem is with [de]serializing > >>> the > >>>>> record, not with any of those those things. So we should choose a > >> name > >>>>> that reflects that serialization is the problem. How about > >>>>> RecordSerializationException? > >>>>> > >>>>> best, > >>>>> Colin > >>>>> > >>>>> > >>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote: > >>>>>> Hi Jason and Ted, > >>>>>> > >>>>>> @Jason > >>>>>> I did not oppose the idea as I'm not too familiar with Java > >>> conventions. > >>>>> I > >>>>>> agree it is a non-ideal way for the user to catch the exception so > >> I > >>>>>> changed it back. > >>>>>> > >>>>>> I've updated the KIP and PR with the latest changes. Now, there is > >>> only > >>>>> one > >>>>>> public exception `FaultyRecordException` which is thrown to the > >> user > >>> in > >>>>>> both scenarios (corrupt record and deserialization error). > >>>>>> Please take a look again. > >>>>>> > >>>>>> Best, > >>>>>> Stanislav > >>>>>> > >>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson <ja...@confluent.io > >>> > >>>>> wrote: > >>>>>> > >>>>>>> Hey Stanislav, > >>>>>>> > >>>>>>> I should have noticed it earlier from your example, but I just > >>> realized > >>>>>>> that interfaces don't mix well with exceptions. There is no way > >> to > >>>>> catch > >>>>>>> the interface type, which means you have to depend on instanceof > >>>>> checks, > >>>>>>> which is not very conventional. At the moment, we raise > >>>>>>> SerializationException if there is a problem parsing the error, > >>> and we > >>>>>>> raise KafkaException if the record fails its CRC check. Since > >>>>>>> SerializationException extends KafkaExeption, it seems like we > >> can > >>>>> handle > >>>>>>> both cases in a compatible way by handling both cases with a > >> single > >>>>> type > >>>>>>> that extends SerializationException. Maybe something like > >>>>>>> RecordDeserializationException? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Jason > >>>>>>> > >>>>>>> On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu <yuzhih...@gmail.com> > >>> wrote: > >>>>>>> > >>>>>>>> +1 > >>>>>>>> -------- Original message --------From: Stanislav Kozlovski < > >>>>>>>> stanis...@confluent.io> Date: 8/2/18 2:41 AM (GMT-08:00) To: > >>>>>>>> dev@kafka.apache.org Subject: [VOTE] KIP-334 Include > >> partitions > >>> in > >>>>>>>> exceptions raised during consumer record > >>> deserialization/validation > >>>>>>>> Hey everybody, > >>>>>>>> > >>>>>>>> I'd like to start a vote thread for KIP-334 Include partitions > >> in > >>>>>>>> exceptions raised during consumer record > >>> deserialization/validation > >>>>>>>> < > >>>>>>> > >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action? > >>> pageId=87297793 > >>>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Best, > >>>>>>>> Stanislav > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Best, > >>>>>> Stanislav > >>>>> > >>>> > >>>> > >>>> -- > >>>> Best, > >>>> Stanislav > >>> > >> > > > > > >