[ https://issues.apache.org/jira/browse/FLINK-32027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720378#comment-17720378 ]
Yun Tang commented on FLINK-32027: ---------------------------------- [~Weijie Guo] [~zhuzh] Please take a look at this issue. > Batch jobs could hang at shuffle phase when max parallelism is really large > --------------------------------------------------------------------------- > > Key: FLINK-32027 > URL: https://issues.apache.org/jira/browse/FLINK-32027 > Project: Flink > Issue Type: Bug > Components: Runtime / Network > Affects Versions: 1.17.0 > Reporter: Yun Tang > Priority: Critical > Fix For: 1.17.1 > > Attachments: image-2023-05-08-11-12-58-361.png > > > In batch stream mode with adaptive batch schedule mode, If we set the max > parallelism large as 32768 (pipeline.max-parallelism), the job could hang at > the shuffle phase: > It would hang for a long time and show "No bytes sent": > !image-2023-05-08-11-12-58-361.png! > After some time to debug, we can see the downstream operator did not receive > the end-of-partition event. -- This message was sent by Atlassian Jira (v8.20.10#820010)