[jira] [Updated] (KAFKA-13320) Suggestion: SMT support for null key/value should be documented

Ben Ellis (Jira) Thu, 23 Sep 2021 07:04:32 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-13320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ben Ellis updated KAFKA-13320:
------------------------------
    Description: 
While working with a JDBC Sink Connector, I noticed that some SMT choke on a 
tombstone (null value) while others handle tombstones fine.

For example:

{code:javascript}
"transforms": "flattenKey,valueToJSON,wrapValue,addTimestamp", 
"transforms.flattenKey.type": 
"org.apache.kafka.connect.transforms.Flatten$Key", 
"transforms.flattenKey.delimiter": "_", "transforms.valueToJSON.type": 
"com.github.jcustenborder.kafka.connect.transform.common.ToJSON$Value", 
"transforms.valueToJSON.schemas.enable": "false", 
"transforms.valueToJSON.predicate": "tombstone", 
"transforms.valueToJSON.negate": true, 
"transforms.wrapValue.type":"org.apache.kafka.connect.transforms.HoistField$Value",
 "transforms.wrapValue.field":"matrix", "transforms.wrapValue.predicate": 
"tombstone", "transforms.wrapValue.negate": true, 
"transforms.addTimestamp.type": 
"org.apache.kafka.connect.transforms.InsertField$Value", 
"transforms.addTimestamp.timestamp.field": "message_timestamp", "predicates": 
"tombstone", "predicates.tombstone.type": 
"org.apache.kafka.connect.transforms.predicates.RecordIsTombstone"
{code}


To avoid the cryptic error “java.lang.ClassCastException: class 
java.util.HashMap cannot be cast to class org.apache.kafka.connect.data.Struct” 
when processing a tombstone record, I had to add a negated predicate of 
RecordIsTombstone for ToJSON (community SMT) and HoistField, but did not need 
to add that to InsertField.

Digging in the source, I find that InsertField handles the case where key or 
value is null:
https://github.com/a0x8o/kafka/blob/f8237749f6ad34c09154f807e53273be64e1261e/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/InsertField.java#L130

^ Thanks to this, there's no need to add a predicate to skip InsertField$Value 
when value is null.

It would help if the docs listed how the individual SMTs behave when dealing 
with a null key/value.

Of course we can always find this out by trial and error or by studying the 
source code.
But if we were to make a best practice of describing how an SMT handles null 
key/value, that would have two benefits:
1) Save developers time when working with the official (shipped with Kafka) SMT
2) Inspire developers who write their own SMT to likewise document how they 
handle null key/value

Perhaps a standard way of dealing with nulls ("no-op if key/value is null") 
could be promoted, and SMT authors would only need to document their behavior 
when it differs.


  was:
While working with a JDBC Sink Connector, I noticed that some SMT choke on a 
tombstone (null value) while others handle tombstones fine.

For example:

```
"transforms": "flattenKey,valueToJSON,wrapValue,addTimestamp", 
"transforms.flattenKey.type": 
"org.apache.kafka.connect.transforms.Flatten$Key", 
"transforms.flattenKey.delimiter": "_", "transforms.valueToJSON.type": 
"com.github.jcustenborder.kafka.connect.transform.common.ToJSON$Value", 
"transforms.valueToJSON.schemas.enable": "false", 
"transforms.valueToJSON.predicate": "tombstone", 
"transforms.valueToJSON.negate": true, 
"transforms.wrapValue.type":"org.apache.kafka.connect.transforms.HoistField$Value",
 "transforms.wrapValue.field":"matrix", "transforms.wrapValue.predicate": 
"tombstone", "transforms.wrapValue.negate": true, 
"transforms.addTimestamp.type": 
"org.apache.kafka.connect.transforms.InsertField$Value", 
"transforms.addTimestamp.timestamp.field": "message_timestamp", "predicates": 
"tombstone", "predicates.tombstone.type": 
"org.apache.kafka.connect.transforms.predicates.RecordIsTombstone"

```

To avoid the cryptic error “java.lang.ClassCastException: class 
java.util.HashMap cannot be cast to class org.apache.kafka.connect.data.Struct” 
when processing a tombstone record, I had to add a negated predicate of 
RecordIsTombstone for ToJSON (community SMT) and HoistField, but did not need 
to add that to InsertField.

Digging in the source, I find that InsertField handles the case where key or 
value is null:
https://github.com/a0x8o/kafka/blob/f8237749f6ad34c09154f807e53273be64e1261e/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/InsertField.java#L130

^ Thanks to this, there's no need to add a predicate to skip InsertField$Value 
when value is null.

It would help if the docs listed how the individual SMTs behave when dealing 
with a null key/value.

Of course we can always find this out by trial and error or by studying the 
source code.
But if we were to make a best practice of describing how an SMT handles null 
key/value, that would have two benefits:
1) Save developers time when working with the official (shipped with Kafka) SMT
2) Inspire developers who write their own SMT to likewise document how they 
handle null key/value

Perhaps a standard way of dealing with nulls ("no-op if key/value is null") 
could be promoted, and SMT authors would only need to document their behavior 
when it differs.



> Suggestion: SMT support for null key/value should be documented
> ---------------------------------------------------------------
>
>                 Key: KAFKA-13320
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13320
>             Project: Kafka
>          Issue Type: Wish
>          Components: KafkaConnect
>            Reporter: Ben Ellis
>            Priority: Minor
>
> While working with a JDBC Sink Connector, I noticed that some SMT choke on a 
> tombstone (null value) while others handle tombstones fine.
> For example:
> {code:javascript}
> "transforms": "flattenKey,valueToJSON,wrapValue,addTimestamp", 
> "transforms.flattenKey.type": 
> "org.apache.kafka.connect.transforms.Flatten$Key", 
> "transforms.flattenKey.delimiter": "_", "transforms.valueToJSON.type": 
> "com.github.jcustenborder.kafka.connect.transform.common.ToJSON$Value", 
> "transforms.valueToJSON.schemas.enable": "false", 
> "transforms.valueToJSON.predicate": "tombstone", 
> "transforms.valueToJSON.negate": true, 
> "transforms.wrapValue.type":"org.apache.kafka.connect.transforms.HoistField$Value",
>  "transforms.wrapValue.field":"matrix", "transforms.wrapValue.predicate": 
> "tombstone", "transforms.wrapValue.negate": true, 
> "transforms.addTimestamp.type": 
> "org.apache.kafka.connect.transforms.InsertField$Value", 
> "transforms.addTimestamp.timestamp.field": "message_timestamp", "predicates": 
> "tombstone", "predicates.tombstone.type": 
> "org.apache.kafka.connect.transforms.predicates.RecordIsTombstone"
> {code}
> To avoid the cryptic error “java.lang.ClassCastException: class 
> java.util.HashMap cannot be cast to class 
> org.apache.kafka.connect.data.Struct” when processing a tombstone record, I 
> had to add a negated predicate of RecordIsTombstone for ToJSON (community 
> SMT) and HoistField, but did not need to add that to InsertField.
> Digging in the source, I find that InsertField handles the case where key or 
> value is null:
> https://github.com/a0x8o/kafka/blob/f8237749f6ad34c09154f807e53273be64e1261e/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/InsertField.java#L130
> ^ Thanks to this, there's no need to add a predicate to skip 
> InsertField$Value when value is null.
> It would help if the docs listed how the individual SMTs behave when dealing 
> with a null key/value.
> Of course we can always find this out by trial and error or by studying the 
> source code.
> But if we were to make a best practice of describing how an SMT handles null 
> key/value, that would have two benefits:
> 1) Save developers time when working with the official (shipped with Kafka) 
> SMT
> 2) Inspire developers who write their own SMT to likewise document how they 
> handle null key/value
> Perhaps a standard way of dealing with nulls ("no-op if key/value is null") 
> could be promoted, and SMT authors would only need to document their behavior 
> when it differs.
> 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KAFKA-13320) Suggestion: SMT support for null key/value should be documented

Reply via email to