[jira] [Commented] (TEZ-4569) SCATTER_GATHER + BROADCAST hangs on DAG Recovery

2024-06-16 Thread Shohei Okumiya (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17855389#comment-17855389
 ] 

Shohei Okumiya commented on TEZ-4569:
-

We have another discussion here.

https://lists.apache.org/thread/q7cnz81k39wzd29hrp08o5vohbrdlhk2

> SCATTER_GATHER + BROADCAST hangs on DAG Recovery
> 
>
> Key: TEZ-4569
> URL: https://issues.apache.org/jira/browse/TEZ-4569
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.9.2, 0.10.3
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
> Attachments: image-2024-06-11-20-45-12-540.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A Tez DAG fails to initialize itself when an Application Master is timely 
> preempted.
>  
> The problem typically happens with Map Join(Broadcast Hash Join) of Hive when 
> the broadcast edge is multi-staged. In the following case, the smaller side 
> includes one aggregation, and the condition is satisfied.
>  
> {code:java}
> CREATE TABLE small AS SELECT 1 AS id;
> CREATE TABLE big AS SELECT 1 AS id UNION ALL SELECT 2 AS id UNION ALL SELECT 
> 3 AS id;
> SELECT *
> FROM big
> JOIN (SELECT id, count(*) AS num FROM small GROUP BY id) s ON big.id = s.id 
> {code}
> Once it happens, a retried AM fails to configure the Map Join vertex. In the 
> following case, Map 1 never starts.
>  
>  
> {code:java}
> --
>         VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED  
> --
> Map 2 .. container     SUCCEEDED      1          1        0        0  
>      0       1  
> Reducer 3 .. container     SUCCEEDED      1          1        0        0  
>      0       0  
> Map 1            container  INITIALIZING     -1          0        0       -1  
>      0       0  
> --
>  {code}
> Tez starts Map 2 and Map 1 once their splits are configured. The hang issue 
> happens when an AM is retried before it starts Reducer 3.
> !image-2024-06-11-20-45-12-540.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TEZ-4569) SCATTER_GATHER + BROADCAST hangs on DAG Recovery

2024-06-11 Thread Shohei Okumiya (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854034#comment-17854034
 ] 

Shohei Okumiya commented on TEZ-4569:
-

Quickly checking, this part is suspicious. This validation assumes all 
predecessors have been configured when any successors start. If a vertice 
accepts a broadcast edge and a data source, the assumption could be wrong.

[https://github.com/apache/tez/blob/rel/release-0.10.3/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L2846]

In my mind, we may have two kinds of approaches.
 # We don't start any successors unless all predecessors have been initialized
 # We correctly restore the state of AM even when any vertices false-start 
themselves

> SCATTER_GATHER + BROADCAST hangs on DAG Recovery
> 
>
> Key: TEZ-4569
> URL: https://issues.apache.org/jira/browse/TEZ-4569
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.10.3
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
> Attachments: image-2024-06-11-20-45-12-540.png
>
>
> A Tez DAG fails to initialize itself when an Application Master is timely 
> preempted.
>  
> The problem typically happens with Map Join(Broadcast Hash Join) of Hive when 
> the broadcast edge is multi-staged. In the following case, the smaller side 
> includes one aggregation, and the condition is satisfied.
>  
> {code:java}
> CREATE TABLE small AS SELECT 1 AS id;
> CREATE TABLE big AS SELECT 1 AS id UNION ALL SELECT 2 AS id UNION ALL SELECT 
> 3 AS id;
> SELECT *
> FROM big
> JOIN (SELECT id, count(*) AS num FROM small GROUP BY id) s ON big.id = s.id 
> {code}
> Once it happens, a retried AM fails to configure the Map Join vertex. In the 
> following case, Map 1 never starts.
>  
>  
> {code:java}
> --
>         VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED  
> --
> Map 2 .. container     SUCCEEDED      1          1        0        0  
>      0       1  
> Reducer 3 .. container     SUCCEEDED      1          1        0        0  
>      0       0  
> Map 1            container  INITIALIZING     -1          0        0       -1  
>      0       0  
> --
>  {code}
> Tez starts Map 2 and Map 1 once their splits are configured. The hang issue 
> happens when an AM is retried before it starts Reducer 3.
> !image-2024-06-11-20-45-12-540.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TEZ-4569) SCATTER_GATHER + BROADCAST hangs on DAG Recovery

2024-06-11 Thread Shohei Okumiya (Jira)


[ 
https://issues.apache.org/jira/browse/TEZ-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854024#comment-17854024
 ] 

Shohei Okumiya commented on TEZ-4569:
-

I created a test case to reproduce the issue first. The 
[testTableScanTemporalFailure|https://github.com/okumin/tez/commit/deac035274bd0b958fbfdf3557dc7120c16fddc5#diff-ad65a331fa51a07f3cc5301ca7df09c199e9730a6f889bc8b1859554ccfc0519R199-R217]
 is the most straightforward reproduction.
{code:java}
2024-06-11 20:18:55,719 INFO  [Time-limited test] client.DAGClientImpl 
(DAGClientImpl.java:log(709)) - DAG: State: RUNNING Progress: 200% TotalTasks: 
1 Succeeded: 2 Running: 0 Failed: 0 Killed: 0 KilledTaskAttempts: 1
2024-06-11 20:18:55,720 INFO  [Time-limited test] client.DAGClientImpl 
(DAGClientImpl.java:log(709)) -     VertexStatus: VertexName: TableScan 
Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 
KilledTaskAttempts: 1
2024-06-11 20:18:55,721 INFO  [Time-limited test] client.DAGClientImpl 
(DAGClientImpl.java:log(709)) -     VertexStatus: VertexName: Aggregation 
Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
2024-06-11 20:18:55,721 INFO  [Time-limited test] client.DAGClientImpl 
(DAGClientImpl.java:log(709)) -     VertexStatus: VertexName: MapJoin Progress: 
0% TotalTasks: -1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
2024-06-11 20:19:00,756 INFO  [Time-limited test] client.DAGClientImpl 
(DAGClientImpl.java:log(709)) - DAG: State: RUNNING Progress: 200% TotalTasks: 
1 Succeeded: 2 Running: 0 Failed: 0 Killed: 0 KilledTaskAttempts: 1
2024-06-11 20:19:00,757 INFO  [Time-limited test] client.DAGClientImpl 
(DAGClientImpl.java:log(709)) -     VertexStatus: VertexName: TableScan 
Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0 
KilledTaskAttempts: 1
2024-06-11 20:19:00,757 INFO  [Time-limited test] client.DAGClientImpl 
(DAGClientImpl.java:log(709)) -     VertexStatus: VertexName: Aggregation 
Progress: 100% TotalTasks: 1 Succeeded: 1 Running: 0 Failed: 0 Killed: 0
2024-06-11 20:19:00,758 INFO  [Time-limited test] client.DAGClientImpl 
(DAGClientImpl.java:log(709)) -     VertexStatus: VertexName: MapJoin Progress: 
0% TotalTasks: -1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0 {code}

> SCATTER_GATHER + BROADCAST hangs on DAG Recovery
> 
>
> Key: TEZ-4569
> URL: https://issues.apache.org/jira/browse/TEZ-4569
> Project: Apache Tez
>  Issue Type: Improvement
>Affects Versions: 0.10.3
>Reporter: Shohei Okumiya
>Assignee: Shohei Okumiya
>Priority: Major
> Attachments: image-2024-06-11-20-45-12-540.png
>
>
> A Tez DAG fails to initialize itself when an Application Master is timely 
> preempted.
>  
> The problem typically happens with Map Join(Broadcast Hash Join) of Hive when 
> the broadcast edge is multi-staged. In the following case, the smaller side 
> includes one aggregation, and the condition is satisfied.
>  
> {code:java}
> CREATE TABLE small AS SELECT 1 AS id;
> CREATE TABLE big AS SELECT 1 AS id UNION ALL SELECT 2 AS id UNION ALL SELECT 
> 3 AS id;
> SELECT *
> FROM big
> JOIN (SELECT id, count(*) AS num FROM small GROUP BY id) s ON big.id = s.id 
> {code}
> Once it happens, a retried AM fails to configure the Map Join vertex. In the 
> following case, Map 1 never starts.
>  
>  
> {code:java}
> --
>         VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED  
> --
> Map 2 .. container     SUCCEEDED      1          1        0        0  
>      0       1  
> Reducer 3 .. container     SUCCEEDED      1          1        0        0  
>      0       0  
> Map 1            container  INITIALIZING     -1          0        0       -1  
>      0       0  
> --
>  {code}
> Tez starts Map 2 and Map 1 once their splits are configured. The hang issue 
> happens when an AM is retried before it starts Reducer 3.
> !image-2024-06-11-20-45-12-540.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)