[jira] [Commented] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.
[ https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794801#comment-17794801 ] Seonggon Namgung commented on HIVE-26986: - [~dkuzmenko] , I just created OperatorGraph before and after PEF application, and created graphviz files using OperatorGraph.toDot() method. > A DAG created by OperatorGraph is not equal to the Tez DAG. > --- > > Key: HIVE-26986 > URL: https://issues.apache.org/jira/browse/HIVE-26986 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0-alpha-2 >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Labels: pull-request-available > Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png > > Time Spent: 50m > Remaining Estimate: 0h > > A DAG created by OperatorGraph is not equal to the corresponding DAG that is > submitted to Tez. > Because of this problem, ParallelEdgeFixer reports a pair of normal edges to > a parallel edge. > We observe this problem by comparing OperatorGraph and Tez DAG when running > TPC-DS query 71 on 1TB ORC format managed table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.
[ https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794791#comment-17794791 ] Denys Kuzmenko commented on HIVE-26986: --- [~seonggon], could you please share how one could create OperatorGraph for the query? under debug OperatorGraph.toDot()? > A DAG created by OperatorGraph is not equal to the Tez DAG. > --- > > Key: HIVE-26986 > URL: https://issues.apache.org/jira/browse/HIVE-26986 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0-alpha-2 >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Labels: pull-request-available > Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png > > Time Spent: 50m > Remaining Estimate: 0h > > A DAG created by OperatorGraph is not equal to the corresponding DAG that is > submitted to Tez. > Because of this problem, ParallelEdgeFixer reports a pair of normal edges to > a parallel edge. > We observe this problem by comparing OperatorGraph and Tez DAG when running > TPC-DS query 71 on 1TB ORC format managed table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.
[ https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789109#comment-17789109 ] Seonggon Namgung commented on HIVE-26986: - [~dkuzmenko] , we don't have a correctness issue with query71 because HIVE-27006 solves HIVE-26660. I'll create a new link from HIVE-26660 to HIVE-27006 and close HIVE-26660. > A DAG created by OperatorGraph is not equal to the Tez DAG. > --- > > Key: HIVE-26986 > URL: https://issues.apache.org/jira/browse/HIVE-26986 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0-alpha-2 >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Labels: pull-request-available > Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png > > Time Spent: 50m > Remaining Estimate: 0h > > A DAG created by OperatorGraph is not equal to the corresponding DAG that is > submitted to Tez. > Because of this problem, ParallelEdgeFixer reports a pair of normal edges to > a parallel edge. > We observe this problem by comparing OperatorGraph and Tez DAG when running > TPC-DS query 71 on 1TB ORC format managed table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.
[ https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789079#comment-17789079 ] Denys Kuzmenko commented on HIVE-26986: --- [~seonggon], could you please clarify, do we have a correctness issue with query71? If not, let's close HIVE-26660 or remove the link from the current ticket since it's only about performance. > A DAG created by OperatorGraph is not equal to the Tez DAG. > --- > > Key: HIVE-26986 > URL: https://issues.apache.org/jira/browse/HIVE-26986 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0-alpha-2 >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Labels: pull-request-available > Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png > > Time Spent: 50m > Remaining Estimate: 0h > > A DAG created by OperatorGraph is not equal to the corresponding DAG that is > submitted to Tez. > Because of this problem, ParallelEdgeFixer reports a pair of normal edges to > a parallel edge. > We observe this problem by comparing OperatorGraph and Tez DAG when running > TPC-DS query 71 on 1TB ORC format managed table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.
[ https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786224#comment-17786224 ] Seonggon Namgung commented on HIVE-26986: - @kkasa 1. This issue is not about data correctness; this issue addresses the insertion of unnecessary ReduceSink operators, which causes unnecessary shuffle during runtime. The unnecessary insertion is performed by ParallelEdgeFixer(PEF), and it makes a wrong decision because OperatorGraph creates wrong a DAG from the given query plan. My previous comments explains how OperatorGraph groups operators into a vertex(cluster in terms of OperatorGraph) in the wrong way. Since this issue originates from OperatorGraph, not PEF or SharedWorkOptimizer(SWO), the submitted PR introduces TestOperatorGraph, which tests the behaviour of OperatorGraph. You can check the problem by running this test using master branch. The following comment explains about the added test for the sake of your better understanding. The test compares 2 DAGs generated by OperatorGraph and TezCompiler. The following graph represents the query plan used in the test. TS1┐ TS2┴UNION─SEL─RS─GBY─RS The correct DAG corresponding to the query plan should be: Map1: \{TS1, SEL, RS1} Map2: \{TS2, SEL, RS1} Reduce: \{GBY, RS2} But current OperatorGraph groups operator into 2 groups as following: Cluster1: \{TS1, TS2, UNION, SEL, RS1} Cluster2: \{GBY, RS2} 2. As I mentioned above, this issue is unrelated to data correctness. Moreover, PEF is applied on a query plan regardless of the value of `hive.optimize.shared.work.parallel.edge.support`. I think the test attached in the PR is sufficient to verify this issue. FYI, `hive.optimize.shared.work.parallel.edge.support` controls the types of edges that are allowed to construct a parallel edge. If it is set to true, DynamicPartitionPruning(DPP), SemiJoinReduction, and Broadcast edges can construct parallel edge. If not, only DPP edges can construct parallel edge. As a consequence, SWO can make parallel edges regardless of the value of `hive.optimize.shared.work.parallel.edge.support`. So Hive always runs PEF after SWO in order to resolve parallel edges by adding extra RS operators. > A DAG created by OperatorGraph is not equal to the Tez DAG. > --- > > Key: HIVE-26986 > URL: https://issues.apache.org/jira/browse/HIVE-26986 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0-alpha-2 >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Labels: hive-4.0.0-must, pull-request-available > Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png > > Time Spent: 50m > Remaining Estimate: 0h > > A DAG created by OperatorGraph is not equal to the corresponding DAG that is > submitted to Tez. > Because of this problem, ParallelEdgeFixer reports a pair of normal edges to > a parallel edge. > We observe this problem by comparing OperatorGraph and Tez DAG when running > TPC-DS query 71 on 1TB ORC format managed table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.
[ https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784956#comment-17784956 ] Krisztian Kasa commented on HIVE-26986: --- [~seonggon] 1. It is not clear why adding extra concentrator RS leads to data correctness issue. Could you please share a simple repro on a small dataset which has the necessary records only. It can be also added to the PR to extend the test coverage of SWO and ParallelEdgeFixer. 2. IIUC parallel edge support can be controlled via config setting. Could you please verify if the correctness issue stands when {code:java} set hive.optimize.shared.work.parallel.edge.support=false; {code} > A DAG created by OperatorGraph is not equal to the Tez DAG. > --- > > Key: HIVE-26986 > URL: https://issues.apache.org/jira/browse/HIVE-26986 > Project: Hive > Issue Type: Sub-task >Affects Versions: 4.0.0-alpha-2 >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Labels: hive-4.0.0-must, pull-request-available > Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png > > Time Spent: 50m > Remaining Estimate: 0h > > A DAG created by OperatorGraph is not equal to the corresponding DAG that is > submitted to Tez. > Because of this problem, ParallelEdgeFixer reports a pair of normal edges to > a parallel edge. > We observe this problem by comparing OperatorGraph and Tez DAG when running > TPC-DS query 71 on 1TB ORC format managed table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26986) A DAG created by OperatorGraph is not equal to the Tez DAG.
[ https://issues.apache.org/jira/browse/HIVE-26986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680933#comment-17680933 ] Seonggon Namgung commented on HIVE-26986: - The attatched image files("Query71 TezDAG.png" and "Query71 OperatorGraph.png") show Tez DAG and OperatorGraph of TPC-DS query71. I set tez.generate.debug.artifacts to get a dot file of Tez DAG. The OperatorGraph is created after ParallelEdgeFixer is applied. The number of clusters in the OperatorGraph is 10, but the number of vertices in the Tez DAG is 12. The difference comes from cluster 3 of the OperatorGraph, which contains 3 TS operators and a UNION operator. Current OperatorGraph creates a singleton cluster for each operator and merges parent operator's cluster to child operator's cluster unless parent operator is ReduceSink operator. As a result, there can be a cluster with multiple root operators, which cannot form a single vertex in Tez DAG. This inequality between Tez DAG and OperatorGraph makes false-positive errors when detecting parallel edges and leads to insertion of unnecessary concentrator RS. > A DAG created by OperatorGraph is not equal to the Tez DAG. > --- > > Key: HIVE-26986 > URL: https://issues.apache.org/jira/browse/HIVE-26986 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0-alpha-2 >Reporter: Seonggon Namgung >Assignee: Seonggon Namgung >Priority: Major > Attachments: Query71 OperatorGraph.png, Query71 TezDAG.png > > > A DAG created by OperatorGraph is not equal to the corresponding DAG that is > submitted to Tez. > Because of this problem, ParallelEdgeFixer reports a pair of normal edges to > a parallel edge. > We observe this problem by comparing OperatorGraph and Tez DAG when running > TPC-DS query 71 on 1TB ORC format managed table. -- This message was sent by Atlassian Jira (v8.20.10#820010)