David Dufour created KAFKA-14064:
------------------------------------

             Summary: MirrorMaker2 stops task when record is too big
                 Key: KAFKA-14064
                 URL: https://issues.apache.org/jira/browse/KAFKA-14064
             Project: Kafka
          Issue Type: Improvement
          Components: mirrormaker
    Affects Versions: 2.7.1
            Reporter: David Dufour


As MirrorMaker2 does currently not support shallow mirrorring ([KIP-712: 
Shallow 
Mirroring|https://wiki.apache.org/confluence/display/KAFKA/KIP-712%3A+Shallow+Mirroring]),
 if a producer has produced using compression in one mirrorred topic, 
MirrorMaker2 will get the message uncompressed at some point and if not 
properly tuned (typically {{{}max.request.size{}}}), it may fail with a 
RecordTooLargeException:
org.apache.kafka.connect.errors.ConnectException: Unrecoverable exception from 
producer send callback
    at 
org.apache.kafka.connect.runtime.WorkerSourceTask.maybeThrowProducerSendException(WorkerSourceTask.java:284)
    at 
org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:338)
    at 
org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:256)
    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:238)
    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The 
message is 1049087 bytes when serialized which is larger than 1048576, which is 
the value of the max.request.size configuration.\n"
          worker_id: 'xxx.xxx.xxx.xxx:8083'

The task is stopped and needs a manual restart. 
However, this seems to be a bit overkill because, amongst all partitions 
replicated by the task, only one is problematic. Stopping the replication on 
all partitions can make a severe impact.
It would be more optimized to 'suspend' the partition involved and keep 
replication working for all remaining ones.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to