[ https://issues.apache.org/jira/browse/KAFKA-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042347#comment-17042347 ]
Sophie Blee-Goldman commented on KAFKA-8037: -------------------------------------------- Right, duh, we definitely wouldn't want the offset commits to be uncompacted. Scratch that last – maybe we could still get away with it but it'd be tight. Is the size just bounded by the usual max.message size or is there some tighter limit on the metadata? I suppose we could just tell users that this isn't a solution to large amounts of bad data. If they expect a lot of it to be corrupted it seems reasonable to either turn off optimization (which unfortunately can't yet be done at the individual table level) or insert a filtration step with the "good" data materialized and changelog-ed. But, it's probably not reasonable to expect users be able to predict how much of the data is corrupted beforehand. The "inverse-changelog" is more reliable for the general use case > KTable restore may load bad data > -------------------------------- > > Key: KAFKA-8037 > URL: https://issues.apache.org/jira/browse/KAFKA-8037 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Matthias J. Sax > Priority: Minor > Labels: pull-request-available > > If an input topic contains bad data, users can specify a > `deserialization.exception.handler` to drop corrupted records on read. > However, this mechanism may be by-passed on restore. Assume a > `builder.table()` call reads and drops a corrupted record. If the table state > is lost and restored from the changelog topic, the corrupted record may be > copied into the store, because on restore plain bytes are copied. > If the KTable is used in a join, an internal `store.get()` call to lookup the > record would fail with a deserialization exception if the value part cannot > be deserialized. > GlobalKTables are affected, too (cf. KAFKA-7663 that may allow a fix for > GlobalKTable case). It's unclear to me atm, how this issue could be addressed > for KTables though. > Note, that user state stores are not affected, because they always have a > dedicated changelog topic (and don't reuse an input topic) and thus the > corrupted record would not be written into the changelog. -- This message was sent by Atlassian Jira (v8.3.4#803005)