Jin Xing created FLINK-22677: -------------------------------- Summary: Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion Key: FLINK-22677 URL: https://issues.apache.org/jira/browse/FLINK-22677 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Reporter: Jin Xing
Current scheduler enforces a synchronous registration though the API of ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In scenario of remote shuffle service, the talk between ShuffleMaster and remote cluster tends to be expensive. A synchronous registration risks to block main thread potentially and might cause negative side effects like heartbeat timeout. Additionally, expensive synchronous invokes to remote could bottleneck the throughput for applying shuffle resource, especially for batch jobs with complicated DAGs; -- This message was sent by Atlassian Jira (v8.3.4#803005)