[ 
https://issues.apache.org/jira/browse/KAFKA-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163096#comment-17163096
 ] 

Sophie Blee-Goldman commented on KAFKA-8037:
--------------------------------------------

By the way, I'm not trying to advocate strongly for any particular solution 
here. I just think we've started to demonize the source changelog optimization 
and start from the premise that it's inherently bad, which seems unfair. Sorry 
for pushing back on this (or to borrow [~guozhang]'s favorite phrase, "play 
devil's advocate" :P)  – it's partly for my own edification, but mostly to 
ensure we have a solid foundation to motivate the changes we want to make 
instead of a vague claim that the optimization is bad and side effects are good 
(or at least, unavoidable).

Both those things might be true – but if so then we should be able to prove it, 
not take it for granted

> KTable restore may load bad data
> --------------------------------
>
>                 Key: KAFKA-8037
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8037
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Minor
>              Labels: pull-request-available
>
> If an input topic contains bad data, users can specify a 
> `deserialization.exception.handler` to drop corrupted records on read. 
> However, this mechanism may be by-passed on restore. Assume a 
> `builder.table()` call reads and drops a corrupted record. If the table state 
> is lost and restored from the changelog topic, the corrupted record may be 
> copied into the store, because on restore plain bytes are copied.
> If the KTable is used in a join, an internal `store.get()` call to lookup the 
> record would fail with a deserialization exception if the value part cannot 
> be deserialized.
> GlobalKTables are affected, too (cf. KAFKA-7663 that may allow a fix for 
> GlobalKTable case). It's unclear to me atm, how this issue could be addressed 
> for KTables though.
> Note, that user state stores are not affected, because they always have a 
> dedicated changelog topic (and don't reuse an input topic) and thus the 
> corrupted record would not be written into the changelog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to