Yun Tang created FLINK-32027: -------------------------------- Summary: Batch jobs could hang at shuffle phase when max parallelism is really large Key: FLINK-32027 URL: https://issues.apache.org/jira/browse/FLINK-32027 Project: Flink Issue Type: Bug Components: Runtime / Network Affects Versions: 1.17.0 Reporter: Yun Tang Fix For: 1.17.1 Attachments: image-2023-05-08-11-12-58-361.png
In batch stream mode with adaptive batch schedule mode, If we set the max parallelism large as 32768 (pipeline.max-parallelism), the job could hang at the shuffle phase: It would hang for a long time and show "No bytes sent": !image-2023-05-08-11-12-58-361.png! After some time to debug, we can see the downstream operator did not receive the end-of-partition event. -- This message was sent by Atlassian Jira (v8.20.10#820010)