[ https://issues.apache.org/jira/browse/KAFKA-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813176#comment-17813176 ]
Mickael Maison commented on KAFKA-15912: ---------------------------------------- The Transformation interface only mentions that [implementations of apply() must be thread safe|https://github.com/apache/kafka/blob/trunk/connect/api/src/main/java/org/apache/kafka/connect/transforms/Transformation.java#L46]. This is not the case for Predicate. > Parallelize conversion and transformation steps in Connect > ---------------------------------------------------------- > > Key: KAFKA-15912 > URL: https://issues.apache.org/jira/browse/KAFKA-15912 > Project: Kafka > Issue Type: Improvement > Components: connect > Reporter: Mickael Maison > Priority: Major > > In busy Connect pipelines, the conversion and transformation steps can > sometimes have a very significant impact on performance. This is especially > true with large records with complex schemas, for example with CDC connectors > like Debezium. > Today in order to always preserve ordering, converters and transformations > are called on one record at a time in a single thread in the Connect worker. > As Connect usually handles records in batches (up to max.poll.records in sink > pipelines, for source pipelines while it really depends on the connector, > most connectors I've seen still tend to return multiple records each loop), > it could be highly beneficial to attempt running the converters and > transformation chain in parallel by a pool a processing threads. > It should be possible to do some of these steps in parallel and still keep > exact ordering. I'm even considering whether an option to lose ordering but > allow even faster processing would make sense. -- This message was sent by Atlassian Jira (v8.20.10#820010)