[jira] [Created] (KAFKA-13926) Proposal to have "HasField" predicate for kafka connect

2022-05-22 Thread Kumud Kumar Srivatsava Tirupati (Jira)
Kumud Kumar Srivatsava Tirupati created KAFKA-13926:
---

 Summary: Proposal to have "HasField" predicate for kafka connect
 Key: KAFKA-13926
 URL: https://issues.apache.org/jira/browse/KAFKA-13926
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Kumud Kumar Srivatsava Tirupati


Hello,

Today's connect predicates enables checks on the record metadata. However, this 
can be limiting considering {*}many inbuilt and custom transformations that we 
(community) use are more key/value centric{*}.

Some use-cases this can solve:
 * Data type conversions of certain pre-identified fields for records coming 
across datasets only if those fields exist. [Ex: TimestampConverter can be run 
only if the specified date field exists irrespective of the record metadata]
 * Skip running certain transform if a given field does/does not exist. A lot 
of inbuilt transforms raise exceptions (Ex: InsertField transform if the field 
already exists) thereby breaking the task. Giving this control enable users to 
consciously configure for such cases.
 * Even though some inbuilt transforms explicitly handle these cases, it would 
still be an unnecessary pass-through loop.
 * Considering each connector usually deals with multiple datasets (Even 100s 
for a database CDC connector), metadata-centric predicate checking will be 
somewhat limiting when we talk about such pre-identified custom metadata fields 
in the records.

I know some of these cases can be handled within the transforms itself but that 
defeats the purpose of having predicates.

We have built this predicate for us and it is found to be extremely helpful. 
Please let me know your thoughts on the same so that I can raise a PR.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (KAFKA-13926) Proposal to have "HasField" predicate for kafka connect

2022-05-22 Thread Kumud Kumar Srivatsava Tirupati (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540756#comment-17540756
 ] 

Kumud Kumar Srivatsava Tirupati commented on KAFKA-13926:
-

Sure [~showuon]. I thought adding a new predicate doesn't affect any existing 
public API. But, will follow the KIP process. Thanks.

> Proposal to have "HasField" predicate for kafka connect
> ---
>
> Key: KAFKA-13926
> URL: https://issues.apache.org/jira/browse/KAFKA-13926
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Kumud Kumar Srivatsava Tirupati
>Assignee: Kumud Kumar Srivatsava Tirupati
>Priority: Major
>
> Hello,
> Today's connect predicates enables checks on the record metadata. However, 
> this can be limiting considering {*}many inbuilt and custom transformations 
> that we (community) use are more key/value centric{*}.
> Some use-cases this can solve:
>  * Data type conversions of certain pre-identified fields for records coming 
> across datasets only if those fields exist. [Ex: TimestampConverter can be 
> run only if the specified date field exists irrespective of the record 
> metadata]
>  * Skip running certain transform if a given field does/does not exist. A lot 
> of inbuilt transforms raise exceptions (Ex: InsertField transform if the 
> field already exists) thereby breaking the task. Giving this control enable 
> users to consciously configure for such cases.
>  * Even though some inbuilt transforms explicitly handle these cases, it 
> would still be an unnecessary pass-through loop.
>  * Considering each connector usually deals with multiple datasets (Even 100s 
> for a database CDC connector), metadata-centric predicate checking will be 
> somewhat limiting when we talk about such pre-identified custom metadata 
> fields in the records.
> I know some of these cases can be handled within the transforms itself but 
> that defeats the purpose of having predicates.
> We have built this predicate for us and it is found to be extremely helpful. 
> Please let me know your thoughts on the same so that I can raise a PR.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (KAFKA-13926) Proposal to have "HasField" predicate for kafka connect

2022-05-25 Thread Kumud Kumar Srivatsava Tirupati (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kumud Kumar Srivatsava Tirupati updated KAFKA-13926:

Description: 
Hello,

Today's connect predicates enables checks on the record metadata. However, this 
can be limiting considering {*}many inbuilt and custom transformations that we 
(community) use are more key/value centric{*}.

Some use-cases this can solve:
 * Data type conversions of certain pre-identified fields for records coming 
across datasets only if those fields exist. [Ex: TimestampConverter can be run 
only if the specified date field exists irrespective of the record metadata]
 * Skip running certain transform if a given field does/does not exist. A lot 
of inbuilt transforms raise exceptions (Ex: InsertField transform if the field 
already exists) thereby breaking the task. Giving this control enable users to 
consciously configure for such cases.
 * Even though some inbuilt transforms explicitly handle these cases, it would 
still be an unnecessary pass-through loop.
 * Considering each connector usually deals with multiple datasets (Even 100s 
for a database CDC connector), metadata-centric predicate checking will be 
somewhat limiting when we talk about such pre-identified custom metadata fields 
in the records.

I know some of these cases can be handled within the transforms itself but that 
defeats the purpose of having predicates.

We have built this predicate for us and it is found to be extremely helpful. 
Please let me know your thoughts on the same so that I can raise a PR.

 

KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-845%3A+%27HasField%27+predicate+for+kafka+connect

  was:
Hello,

Today's connect predicates enables checks on the record metadata. However, this 
can be limiting considering {*}many inbuilt and custom transformations that we 
(community) use are more key/value centric{*}.

Some use-cases this can solve:
 * Data type conversions of certain pre-identified fields for records coming 
across datasets only if those fields exist. [Ex: TimestampConverter can be run 
only if the specified date field exists irrespective of the record metadata]
 * Skip running certain transform if a given field does/does not exist. A lot 
of inbuilt transforms raise exceptions (Ex: InsertField transform if the field 
already exists) thereby breaking the task. Giving this control enable users to 
consciously configure for such cases.
 * Even though some inbuilt transforms explicitly handle these cases, it would 
still be an unnecessary pass-through loop.
 * Considering each connector usually deals with multiple datasets (Even 100s 
for a database CDC connector), metadata-centric predicate checking will be 
somewhat limiting when we talk about such pre-identified custom metadata fields 
in the records.

I know some of these cases can be handled within the transforms itself but that 
defeats the purpose of having predicates.

We have built this predicate for us and it is found to be extremely helpful. 
Please let me know your thoughts on the same so that I can raise a PR.


> Proposal to have "HasField" predicate for kafka connect
> ---
>
> Key: KAFKA-13926
> URL: https://issues.apache.org/jira/browse/KAFKA-13926
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Kumud Kumar Srivatsava Tirupati
>Assignee: Kumud Kumar Srivatsava Tirupati
>Priority: Major
>
> Hello,
> Today's connect predicates enables checks on the record metadata. However, 
> this can be limiting considering {*}many inbuilt and custom transformations 
> that we (community) use are more key/value centric{*}.
> Some use-cases this can solve:
>  * Data type conversions of certain pre-identified fields for records coming 
> across datasets only if those fields exist. [Ex: TimestampConverter can be 
> run only if the specified date field exists irrespective of the record 
> metadata]
>  * Skip running certain transform if a given field does/does not exist. A lot 
> of inbuilt transforms raise exceptions (Ex: InsertField transform if the 
> field already exists) thereby breaking the task. Giving this control enable 
> users to consciously configure for such cases.
>  * Even though some inbuilt transforms explicitly handle these cases, it 
> would still be an unnecessary pass-through loop.
>  * Considering each connector usually deals with multiple datasets (Even 100s 
> for a database CDC connector), metadata-centric predicate checking will be 
> somewhat limiting when we talk about such pre-identified custom metadata 
> fields in the records.
> I know some of these cases can be handled within the transforms itself but 
> that defeats the purpose of having predicates.
> We have built this predicate for us

[jira] [Commented] (KAFKA-13926) Proposal to have "HasField" predicate for kafka connect

2022-05-25 Thread Kumud Kumar Srivatsava Tirupati (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542074#comment-17542074
 ] 

Kumud Kumar Srivatsava Tirupati commented on KAFKA-13926:
-

[~showuon] Added a new KIP at 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-845%3A+%27HasField%27+predicate+for+kafka+connect]

Please review

> Proposal to have "HasField" predicate for kafka connect
> ---
>
> Key: KAFKA-13926
> URL: https://issues.apache.org/jira/browse/KAFKA-13926
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Kumud Kumar Srivatsava Tirupati
>Assignee: Kumud Kumar Srivatsava Tirupati
>Priority: Major
>
> Hello,
> Today's connect predicates enables checks on the record metadata. However, 
> this can be limiting considering {*}many inbuilt and custom transformations 
> that we (community) use are more key/value centric{*}.
> Some use-cases this can solve:
>  * Data type conversions of certain pre-identified fields for records coming 
> across datasets only if those fields exist. [Ex: TimestampConverter can be 
> run only if the specified date field exists irrespective of the record 
> metadata]
>  * Skip running certain transform if a given field does/does not exist. A lot 
> of inbuilt transforms raise exceptions (Ex: InsertField transform if the 
> field already exists) thereby breaking the task. Giving this control enable 
> users to consciously configure for such cases.
>  * Even though some inbuilt transforms explicitly handle these cases, it 
> would still be an unnecessary pass-through loop.
>  * Considering each connector usually deals with multiple datasets (Even 100s 
> for a database CDC connector), metadata-centric predicate checking will be 
> somewhat limiting when we talk about such pre-identified custom metadata 
> fields in the records.
> I know some of these cases can be handled within the transforms itself but 
> that defeats the purpose of having predicates.
> We have built this predicate for us and it is found to be extremely helpful. 
> Please let me know your thoughts on the same so that I can raise a PR.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13926) Proposal to have "HasField" predicate for kafka connect

2022-06-03 Thread Kumud Kumar Srivatsava Tirupati (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kumud Kumar Srivatsava Tirupati resolved KAFKA-13926.
-
Resolution: Won't Fix

Dropping in favor of improving the existing SMTs as per the discussion.

https://lists.apache.org/thread/odbj7793plyz7xxyy6d71c3xn7zng49f

> Proposal to have "HasField" predicate for kafka connect
> ---
>
> Key: KAFKA-13926
> URL: https://issues.apache.org/jira/browse/KAFKA-13926
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Kumud Kumar Srivatsava Tirupati
>Assignee: Kumud Kumar Srivatsava Tirupati
>Priority: Major
>
> Hello,
> Today's connect predicates enables checks on the record metadata. However, 
> this can be limiting considering {*}many inbuilt and custom transformations 
> that we (community) use are more key/value centric{*}.
> Some use-cases this can solve:
>  * Data type conversions of certain pre-identified fields for records coming 
> across datasets only if those fields exist. [Ex: TimestampConverter can be 
> run only if the specified date field exists irrespective of the record 
> metadata]
>  * Skip running certain transform if a given field does/does not exist. A lot 
> of inbuilt transforms raise exceptions (Ex: InsertField transform if the 
> field already exists) thereby breaking the task. Giving this control enable 
> users to consciously configure for such cases.
>  * Even though some inbuilt transforms explicitly handle these cases, it 
> would still be an unnecessary pass-through loop.
>  * Considering each connector usually deals with multiple datasets (Even 100s 
> for a database CDC connector), metadata-centric predicate checking will be 
> somewhat limiting when we talk about such pre-identified custom metadata 
> fields in the records.
> I know some of these cases can be handled within the transforms itself but 
> that defeats the purpose of having predicates.
> We have built this predicate for us and it is found to be extremely helpful. 
> Please let me know your thoughts on the same so that I can raise a PR.
>  
> KIP: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-845%3A+%27HasField%27+predicate+for+kafka+connect



--
This message was sent by Atlassian Jira
(v8.20.7#820007)