Padmaprasad Aithal created KAFKA-18830:
------------------------------------------
Summary: Enable parallel processing for transformation chain
Key: KAFKA-18830
URL: https://issues.apache.org/jira/browse/KAFKA-18830
Project: Kafka
Issue Type: Improvement
Components: connect
Reporter: Padmaprasad Aithal
For Kafka Connect with Debezium plugin, we have below flow
Database Logs <- Debezium Plugin <- Kafka Connect -> Transformation Chain
(Serial) -> Kafka Topic
If transformation chain is expensive operation, overall latency from Database
to Topic will be higher. If all transformations in the chain could be run
parallell, we can leverage parallel transformation to improve the throughput
and reduce overall latency without disturbing the order of events.
With parallel transformation, we will be having below flow:
{noformat}
Database Logs <- Debezium Plugin <- Kafka Connect -> Transformation Chain
(Parallel) -> Kafka Topic{noformat}
Pseudo Code:
{noformat}
if (isParallelTransformEnabled) {
toSend.parallelStream().forEach(record -> {
SourceRecord transformedRecord = transformationChain.apply(record);
if (transformedRecord != null) {
transformedRecordMap.put(record, transformedRecord);
}
});
}
...
if (isParallelTransformEnabled) {
record = transformedRecordMap.get(preTransformRecord);
} else {
record = transformationChain.apply(preTransformRecord);
}{noformat}
PS:
Since all transformation may not be thread safe and support parallelism, this
feature is enabled through the feature flag
The same implementation can be found here:
https://github.com/confluentinc/kafka/pull/1282
--
This message was sent by Atlassian Jira
(v8.20.10#820010)