[ 
https://issues.apache.org/jira/browse/FLINK-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhu Zhu updated FLINK-17330:
----------------------------
    Description: 
Imagine a job like this:
A -- (pipelined FORWARD) --> B -- (blocking ALL-to-ALL) --> D
A -- (pipelined FORWARD) --> C -- (pipelined FORWARD) --> D
parallelism=2 for all vertices.

We will have 2 execution pipelined regions:
R1={A1, B1, C1, D1}
R2={A2, B2, C2, D2}

R1 has a cross-region input edge (B2->D1).
R2 has a cross-region input edge (B1->D2).

Scheduling deadlock will happen since we schedule a region only when all its 
inputs are consumable (i.e. blocking partitions to be finished). Because R1 can 
be scheduled only if R2 finishes, while R2 can be scheduled only if R1 finishes.

To avoid this, one solution is to force a logical pipelined region with 
intra-region ALL-to-ALL blocking edges to form one only execution pipelined 
region, so that there would not be cyclic input dependency between regions.
Besides that, we should also pay attention to avoid cyclic cross-region 
POINTWISE blocking edges. 

  was:
Imagine a job like this:
A -- (pipelined FORWARD) --> B -- (blocking ALL-to-ALL) --> D
A -- (pipelined FORWARD) --> C -- (pipelined FORWARD) --> D
parallelism=2 for all vertices.

We will have 2 execution pipelined regions:
R1={A1, B1, C1, D1}, R2={A2, B2, C2, D2}

R1 has a cross-region input edge (B2->D1).
R2 has a cross-region input edge (B1->D2).

Scheduling deadlock will happen since we schedule a region only when all its 
inputs are consumable (i.e. blocking partitions to be finished). Because R1 can 
be scheduled only if R2 finishes, while R2 can be scheduled only if R1 finishes.

To avoid this, one solution is to force a logical pipelined region with 
intra-region ALL-to-ALL blocking edges to form one only execution pipelined 
region, so that there would not be cyclic input dependency between regions.
Besides that, we should also pay attention to avoid cyclic cross-region 
POINTWISE blocking edges. 


> Avoid scheduling deadlocks caused by intra-logical-region ALL-to-ALL blocking 
> edges
> -----------------------------------------------------------------------------------
>
>                 Key: FLINK-17330
>                 URL: https://issues.apache.org/jira/browse/FLINK-17330
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.11.0
>            Reporter: Zhu Zhu
>            Priority: Major
>             Fix For: 1.11.0
>
>
> Imagine a job like this:
> A -- (pipelined FORWARD) --> B -- (blocking ALL-to-ALL) --> D
> A -- (pipelined FORWARD) --> C -- (pipelined FORWARD) --> D
> parallelism=2 for all vertices.
> We will have 2 execution pipelined regions:
> R1={A1, B1, C1, D1}
> R2={A2, B2, C2, D2}
> R1 has a cross-region input edge (B2->D1).
> R2 has a cross-region input edge (B1->D2).
> Scheduling deadlock will happen since we schedule a region only when all its 
> inputs are consumable (i.e. blocking partitions to be finished). Because R1 
> can be scheduled only if R2 finishes, while R2 can be scheduled only if R1 
> finishes.
> To avoid this, one solution is to force a logical pipelined region with 
> intra-region ALL-to-ALL blocking edges to form one only execution pipelined 
> region, so that there would not be cyclic input dependency between regions.
> Besides that, we should also pay attention to avoid cyclic cross-region 
> POINTWISE blocking edges. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to