[jira] [Commented] (KAFKA-8037) KTable restore may load bad data

Sophie Blee-Goldman (Jira) Wed, 22 Jul 2020 14:48:35 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163090#comment-17163090
 ]


Sophie Blee-Goldman commented on KAFKA-8037:
--------------------------------------------

[~vvcephei]I think you're referring to the proposal to make a call based on 
whether two serdes are identical, which just to clarify, was not my proposal. I 
don't think there's any reasonable way to do that. I also don't think it's 
necessary.

As far as ways in which the source topic does not equal the changelog, are 
there any besides the asymmetric/side-effect serdes and corrupt data? The 
second can be handled based on the deserialization exception handler, whereas 
the first still seems like an inappropriate use of Streams to me. Note that we 
_do_ (and must) enforce that key serdes are completely symmetric (for 
partitioning  reasons), so I don't see why we would make an exception for value 
serdes. That seems like inviting trouble (or specifically, inviting users to 
accidentally incorrectly partition their data by assuming, fairly, that the 
same rules apply to key and value serdes)

The side effect possibility is one I'm still unsure about, however. This 
problem might be unavoidable. Hopefully [~agavra] or someone can explain what 
kinds of side effects we see in serdes

> KTable restore may load bad data
> --------------------------------
>
>                 Key: KAFKA-8037
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8037
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Minor
>              Labels: pull-request-available
>
> If an input topic contains bad data, users can specify a 
> `deserialization.exception.handler` to drop corrupted records on read. 
> However, this mechanism may be by-passed on restore. Assume a 
> `builder.table()` call reads and drops a corrupted record. If the table state 
> is lost and restored from the changelog topic, the corrupted record may be 
> copied into the store, because on restore plain bytes are copied.
> If the KTable is used in a join, an internal `store.get()` call to lookup the 
> record would fail with a deserialization exception if the value part cannot 
> be deserialized.
> GlobalKTables are affected, too (cf. KAFKA-7663 that may allow a fix for 
> GlobalKTable case). It's unclear to me atm, how this issue could be addressed 
> for KTables though.
> Note, that user state stores are not affected, because they always have a 
> dedicated changelog topic (and don't reuse an input topic) and thus the 
> corrupted record would not be written into the changelog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-8037) KTable restore may load bad data

Reply via email to