[jira] [Updated] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion
[ https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-22677: --- Labels: pull-request-available (was: ) > Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real > asynchronous fashion > -- > > Key: FLINK-22677 > URL: https://issues.apache.org/jira/browse/FLINK-22677 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.14.0 >Reporter: Jin Xing >Assignee: Zhu Zhu >Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > > Current scheduler enforces a synchronous registration though the API of > ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In > scenario of remote shuffle service, the talk between ShuffleMaster and remote > cluster tends to be expensive. A synchronous registration risks to block main > thread potentially and might cause negative side effects like heartbeat > timeout. Additionally, expensive synchronous invokes to remote could > bottleneck the throughput for applying shuffle resource, especially for batch > jobs with complicated DAGs; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion
[ https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22677: Affects Version/s: 1.14.0 > Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real > asynchronous fashion > -- > > Key: FLINK-22677 > URL: https://issues.apache.org/jira/browse/FLINK-22677 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.14.0 >Reporter: Jin Xing >Assignee: Zhu Zhu >Priority: Major > > Current scheduler enforces a synchronous registration though the API of > ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In > scenario of remote shuffle service, the talk between ShuffleMaster and remote > cluster tends to be expensive. A synchronous registration risks to block main > thread potentially and might cause negative side effects like heartbeat > timeout. Additionally, expensive synchronous invokes to remote could > bottleneck the throughput for applying shuffle resource, especially for batch > jobs with complicated DAGs; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion
[ https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhu Zhu updated FLINK-22677: Fix Version/s: 1.14.0 > Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real > asynchronous fashion > -- > > Key: FLINK-22677 > URL: https://issues.apache.org/jira/browse/FLINK-22677 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.14.0 >Reporter: Jin Xing >Assignee: Zhu Zhu >Priority: Major > Fix For: 1.14.0 > > > Current scheduler enforces a synchronous registration though the API of > ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In > scenario of remote shuffle service, the talk between ShuffleMaster and remote > cluster tends to be expensive. A synchronous registration risks to block main > thread potentially and might cause negative side effects like heartbeat > timeout. Additionally, expensive synchronous invokes to remote could > bottleneck the throughput for applying shuffle resource, especially for batch > jobs with complicated DAGs; -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion
[ https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jin Xing updated FLINK-22677: - Description: Current scheduler enforces a synchronous registration though the API of ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In scenario of remote shuffle service, the talk between ShuffleMaster and remote cluster tends to be expensive. A synchronous registration risks to block main thread potentially and might cause negative side effects like heartbeat timeout. Additionally, expensive synchronous invokes to remote could bottleneck the throughput for applying shuffle resource, especially for batch jobs with complicated DAGs; (was: Current scheduler enforces a synchronous registration though the API of ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In scenario of remote shuffle service, the talk between ShuffleMaster and remote cluster tends to be expensive. A synchronous registration risks to block main thread potentially and might cause negative side effects like heartbeat timeout. Additionally, expensive synchronous invokes to remote could bottleneck the throughput for applying shuffle resource, especially for batch jobs with complicated DAGs;) > Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real > asynchronous fashion > -- > > Key: FLINK-22677 > URL: https://issues.apache.org/jira/browse/FLINK-22677 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Reporter: Jin Xing >Priority: Major > > Current scheduler enforces a synchronous registration though the API of > ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In > scenario of remote shuffle service, the talk between ShuffleMaster and remote > cluster tends to be expensive. A synchronous registration risks to block main > thread potentially and might cause negative side effects like heartbeat > timeout. Additionally, expensive synchronous invokes to remote could > bottleneck the throughput for applying shuffle resource, especially for batch > jobs with complicated DAGs; -- This message was sent by Atlassian Jira (v8.3.4#803005)