[ https://issues.apache.org/jira/browse/KAFKA-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ben Ellis updated KAFKA-13320: ------------------------------ Description: While working with a JDBC Sink Connector, I noticed that some SMT choke on a tombstone (null value) while others handle tombstones fine. For example: {code:javascript} "transforms": "flattenKey,valueToJSON,wrapValue,addTimestamp", "transforms.flattenKey.type": "org.apache.kafka.connect.transforms.Flatten$Key", "transforms.flattenKey.delimiter": "_", "transforms.valueToJSON.type": "com.github.jcustenborder.kafka.connect.transform.common.ToJSON$Value", "transforms.valueToJSON.schemas.enable": "false", "transforms.valueToJSON.predicate": "tombstone", "transforms.valueToJSON.negate": true, "transforms.wrapValue.type":"org.apache.kafka.connect.transforms.HoistField$Value", "transforms.wrapValue.field":"matrix", "transforms.wrapValue.predicate": "tombstone", "transforms.wrapValue.negate": true, "transforms.addTimestamp.type": "org.apache.kafka.connect.transforms.InsertField$Value", "transforms.addTimestamp.timestamp.field": "message_timestamp", "predicates": "tombstone", "predicates.tombstone.type": "org.apache.kafka.connect.transforms.predicates.RecordIsTombstone" {code} To avoid the cryptic error “java.lang.ClassCastException: class java.util.HashMap cannot be cast to class org.apache.kafka.connect.data.Struct” when processing a tombstone record, I had to add a negated predicate of RecordIsTombstone for ToJSON (community SMT) and HoistField, but did not need to add that to InsertField. Digging in the source, I find that InsertField handles the case where key or value is null: [https://github.com/a0x8o/kafka/blob/f8237749f6ad34c09154f807e53273be64e1261e/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/InsertField.java#L130] ^ Thanks to this, there's no need to add a predicate to skip InsertField$Value when value is null. It would help if the docs listed how the individual SMTs behave when dealing with a null key/value. Of course we can always find this out by trial and error or by studying the source code. But if we were to make a best practice of describing how an SMT handles null key/value, that would have two benefits: 1) Save developers time when working with the official (shipped with Kafka) SMT 2) Inspire developers who write their own SMT to likewise document how they handle null key/value Perhaps a standard way of dealing with nulls ("no-op if key/value is null") could be promoted, and SMT authors would only need to document their behavior when it differs. was: While working with a JDBC Sink Connector, I noticed that some SMT choke on a tombstone (null value) while others handle tombstones fine. For example: {code:javascript} "transforms": "flattenKey,valueToJSON,wrapValue,addTimestamp", "transforms.flattenKey.type": "org.apache.kafka.connect.transforms.Flatten$Key", "transforms.flattenKey.delimiter": "_", "transforms.valueToJSON.type": "com.github.jcustenborder.kafka.connect.transform.common.ToJSON$Value", "transforms.valueToJSON.schemas.enable": "false", "transforms.valueToJSON.predicate": "tombstone", "transforms.valueToJSON.negate": true, "transforms.wrapValue.type":"org.apache.kafka.connect.transforms.HoistField$Value", "transforms.wrapValue.field":"matrix", "transforms.wrapValue.predicate": "tombstone", "transforms.wrapValue.negate": true, "transforms.addTimestamp.type": "org.apache.kafka.connect.transforms.InsertField$Value", "transforms.addTimestamp.timestamp.field": "message_timestamp", "predicates": "tombstone", "predicates.tombstone.type": "org.apache.kafka.connect.transforms.predicates.RecordIsTombstone" {code} To avoid the cryptic error “java.lang.ClassCastException: class java.util.HashMap cannot be cast to class org.apache.kafka.connect.data.Struct” when processing a tombstone record, I had to add a negated predicate of RecordIsTombstone for ToJSON (community SMT) and HoistField, but did not need to add that to InsertField. Digging in the source, I find that InsertField handles the case where key or value is null: https://github.com/a0x8o/kafka/blob/f8237749f6ad34c09154f807e53273be64e1261e/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/InsertField.java#L130 ^ Thanks to this, there's no need to add a predicate to skip InsertField$Value when value is null. It would help if the docs listed how the individual SMTs behave when dealing with a null key/value. Of course we can always find this out by trial and error or by studying the source code. But if we were to make a best practice of describing how an SMT handles null key/value, that would have two benefits: 1) Save developers time when working with the official (shipped with Kafka) SMT 2) Inspire developers who write their own SMT to likewise document how they handle null key/value Perhaps a standard way of dealing with nulls ("no-op if key/value is null") could be promoted, and SMT authors would only need to document their behavior when it differs. > Suggestion: SMT support for null key/value should be documented > --------------------------------------------------------------- > > Key: KAFKA-13320 > URL: https://issues.apache.org/jira/browse/KAFKA-13320 > Project: Kafka > Issue Type: Wish > Components: KafkaConnect > Reporter: Ben Ellis > Priority: Minor > > While working with a JDBC Sink Connector, I noticed that some SMT choke on a > tombstone (null value) while others handle tombstones fine. > For example: > {code:javascript} > "transforms": "flattenKey,valueToJSON,wrapValue,addTimestamp", > "transforms.flattenKey.type": > "org.apache.kafka.connect.transforms.Flatten$Key", > "transforms.flattenKey.delimiter": "_", > "transforms.valueToJSON.type": > "com.github.jcustenborder.kafka.connect.transform.common.ToJSON$Value", > "transforms.valueToJSON.schemas.enable": "false", > "transforms.valueToJSON.predicate": "tombstone", > "transforms.valueToJSON.negate": true, > "transforms.wrapValue.type":"org.apache.kafka.connect.transforms.HoistField$Value", > "transforms.wrapValue.field":"matrix", > "transforms.wrapValue.predicate": "tombstone", > "transforms.wrapValue.negate": true, > "transforms.addTimestamp.type": > "org.apache.kafka.connect.transforms.InsertField$Value", > "transforms.addTimestamp.timestamp.field": "message_timestamp", > "predicates": "tombstone", > "predicates.tombstone.type": > "org.apache.kafka.connect.transforms.predicates.RecordIsTombstone" > {code} > To avoid the cryptic error “java.lang.ClassCastException: class > java.util.HashMap cannot be cast to class > org.apache.kafka.connect.data.Struct” when processing a tombstone record, I > had to add a negated predicate of RecordIsTombstone for ToJSON (community > SMT) and HoistField, but did not need to add that to InsertField. > Digging in the source, I find that InsertField handles the case where key or > value is null: > > [https://github.com/a0x8o/kafka/blob/f8237749f6ad34c09154f807e53273be64e1261e/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/InsertField.java#L130] > ^ Thanks to this, there's no need to add a predicate to skip > InsertField$Value when value is null. > It would help if the docs listed how the individual SMTs behave when dealing > with a null key/value. > Of course we can always find this out by trial and error or by studying the > source code. > But if we were to make a best practice of describing how an SMT handles null > key/value, that would have two benefits: > 1) Save developers time when working with the official (shipped with Kafka) > SMT > 2) Inspire developers who write their own SMT to likewise document how they > handle null key/value > Perhaps a standard way of dealing with nulls ("no-op if key/value is null") > could be promoted, and SMT authors would only need to document their behavior > when it differs. -- This message was sent by Atlassian Jira (v8.3.4#803005)