[ 
https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17936762#comment-17936762
 ] 

Seonggon Namgung commented on HIVE-26986:
-----------------------------------------

[~okumin] , [~dkuzmenko] Yes, it seems that this patch causes the OOM issue by 
merging two TableScan operators and creating a Map vertex that contains two 
parallel MapJoin operations, forming a TS-\{MaJ, MaJ} pattern. I also checked 
that the OOM occurred in the new merged vertex, Map 3.

I noticed that this qfile had been disabled for some time due to OOM issues 
(HIVE-26820, HIVE-27695). It seems that the increased memory allocation is 
insufficient to handle two MapJoins running concurrently within a single task.

As a quick fix, I think disabling shared work optimization 
(hive.optimize.shared.work=false) could be a possible option. Since this qfile 
is irrelevant to SWO, disabling it might be safe in this case.

Since this issue could also happen in production environments, we might be able 
to prevent this issue by denying TableScan merges if they result in too heavy 
MapJoin workload in a single task. I don’t have a concrete implementation idea 
at the moment, but it could be handled similarly to HIVE-28548 or HIVE-28549.

Additionally, but unrelated to the OOM issue, I also noticed that the comment 
in hybridgrace_hashjoin_2.q and its output file don’t match. The qfile states 
that it tests n-way joins, but the query plan does not include any n-way join. 
This discrepancy seems to have existed since HIVE-21189, which changed the 
default value of hive.merge.nway.joins to false.

> ParallelEdgeFixer adds redundant reduce sink operators
> ------------------------------------------------------
>
>                 Key: HIVE-26986
>                 URL: https://issues.apache.org/jira/browse/HIVE-26986
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Seonggon Namgung
>            Assignee: Seonggon Namgung
>            Priority: Major
>              Labels: hive-4.1.0-must, pull-request-available
>             Fix For: 4.1.0
>
>         Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> A DAG created by OperatorGraph is not equal to the corresponding DAG that is 
> submitted to Tez.
> Because of this problem, ParallelEdgeFixer reports a pair of normal edges to 
> a parallel edge.
> We observe this problem by comparing OperatorGraph and Tez DAG when running 
> TPC-DS query 71 on 1TB ORC format managed table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to