[jira] [Updated] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion

2021-07-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-22677:
---
Labels: pull-request-available  (was: )

> Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real 
> asynchronous fashion
> --
>
> Key: FLINK-22677
> URL: https://issues.apache.org/jira/browse/FLINK-22677
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0
>Reporter: Jin Xing
>Assignee: Zhu Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.14.0
>
>
> Current scheduler enforces a synchronous registration though the API of 
> ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In 
> scenario of remote shuffle service, the talk between ShuffleMaster and remote 
> cluster tends to be expensive. A synchronous registration risks to block main 
> thread potentially and might cause negative side effects like heartbeat 
> timeout. Additionally, expensive synchronous invokes to remote could 
> bottleneck the throughput for applying shuffle resource, especially for batch 
> jobs with complicated DAGs;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion

2021-07-02 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-22677:

Affects Version/s: 1.14.0

> Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real 
> asynchronous fashion
> --
>
> Key: FLINK-22677
> URL: https://issues.apache.org/jira/browse/FLINK-22677
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0
>Reporter: Jin Xing
>Assignee: Zhu Zhu
>Priority: Major
>
> Current scheduler enforces a synchronous registration though the API of 
> ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In 
> scenario of remote shuffle service, the talk between ShuffleMaster and remote 
> cluster tends to be expensive. A synchronous registration risks to block main 
> thread potentially and might cause negative side effects like heartbeat 
> timeout. Additionally, expensive synchronous invokes to remote could 
> bottleneck the throughput for applying shuffle resource, especially for batch 
> jobs with complicated DAGs;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion

2021-07-02 Thread Zhu Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-22677:

Fix Version/s: 1.14.0

> Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real 
> asynchronous fashion
> --
>
> Key: FLINK-22677
> URL: https://issues.apache.org/jira/browse/FLINK-22677
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Affects Versions: 1.14.0
>Reporter: Jin Xing
>Assignee: Zhu Zhu
>Priority: Major
> Fix For: 1.14.0
>
>
> Current scheduler enforces a synchronous registration though the API of 
> ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In 
> scenario of remote shuffle service, the talk between ShuffleMaster and remote 
> cluster tends to be expensive. A synchronous registration risks to block main 
> thread potentially and might cause negative side effects like heartbeat 
> timeout. Additionally, expensive synchronous invokes to remote could 
> bottleneck the throughput for applying shuffle resource, especially for batch 
> jobs with complicated DAGs;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion

2021-05-16 Thread Jin Xing (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-22677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jin Xing updated FLINK-22677:
-
Description: Current scheduler enforces a synchronous registration though 
the API of ShuffleMaster#registerPartitionWithProducer returns a 
CompletableFuture. In scenario of remote shuffle service, the talk between 
ShuffleMaster and remote cluster tends to be expensive. A synchronous 
registration risks to block main thread potentially and might cause negative 
side effects like heartbeat timeout. Additionally, expensive synchronous 
invokes to remote could bottleneck the throughput for applying shuffle 
resource, especially for batch jobs with complicated DAGs;  (was: Current 
scheduler enforces a synchronous registration though the API of 
ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In 
scenario of remote shuffle service, the talk between ShuffleMaster and remote 
cluster tends to be expensive. A synchronous registration risks to block main 
thread potentially and might cause negative side effects like heartbeat timeout.

Additionally, expensive synchronous invokes to remote could bottleneck the 
throughput for applying shuffle resource, especially for batch jobs with 
complicated DAGs;)

> Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real 
> asynchronous fashion
> --
>
> Key: FLINK-22677
> URL: https://issues.apache.org/jira/browse/FLINK-22677
> Project: Flink
>  Issue Type: Sub-task
>  Components: Runtime / Coordination
>Reporter: Jin Xing
>Priority: Major
>
> Current scheduler enforces a synchronous registration though the API of 
> ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In 
> scenario of remote shuffle service, the talk between ShuffleMaster and remote 
> cluster tends to be expensive. A synchronous registration risks to block main 
> thread potentially and might cause negative side effects like heartbeat 
> timeout. Additionally, expensive synchronous invokes to remote could 
> bottleneck the throughput for applying shuffle resource, especially for batch 
> jobs with complicated DAGs;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)