[ 
https://issues.apache.org/jira/browse/KAFKA-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813176#comment-17813176
 ] 

Mickael Maison commented on KAFKA-15912:
----------------------------------------

The Transformation interface only mentions that [implementations of apply() 
must be thread 
safe|https://github.com/apache/kafka/blob/trunk/connect/api/src/main/java/org/apache/kafka/connect/transforms/Transformation.java#L46].
 This is not the case for Predicate.

> Parallelize conversion and transformation steps in Connect
> ----------------------------------------------------------
>
>                 Key: KAFKA-15912
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15912
>             Project: Kafka
>          Issue Type: Improvement
>          Components: connect
>            Reporter: Mickael Maison
>            Priority: Major
>
> In busy Connect pipelines, the conversion and transformation steps can 
> sometimes have a very significant impact on performance. This is especially 
> true with large records with complex schemas, for example with CDC connectors 
> like Debezium.
> Today in order to always preserve ordering, converters and transformations 
> are called on one record at a time in a single thread in the Connect worker. 
> As Connect usually handles records in batches (up to max.poll.records in sink 
> pipelines, for source pipelines while it really depends on the connector, 
> most connectors I've seen still tend to return multiple records each loop), 
> it could be highly beneficial to attempt running the converters and 
> transformation chain in parallel by a pool a processing threads.
> It should be possible to do some of these steps in parallel and still keep 
> exact ordering. I'm even considering whether an option to lose ordering but 
> allow even faster processing would make sense.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to