[ https://issues.apache.org/jira/browse/HIVE-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sahil Takiar updated HIVE-16507: -------------------------------- Status: Patch Available (was: Open) > Hive Explain User-Level may print out "Vertex dependency in root stage" twice > ----------------------------------------------------------------------------- > > Key: HIVE-16507 > URL: https://issues.apache.org/jira/browse/HIVE-16507 > Project: Hive > Issue Type: Bug > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Attachments: HIVE-16507.1.patch > > > User-level explain plans have a section titled {{Vertex dependency in root > stage}} - which (according to the name) prints out the dependencies between > all vertices that are in the root stage. > This logic is controlled by {{DagJsonParser#print}} and it may print out > {{Vertex dependency in root stage}} twice. > The logic in this method first extracts all stages and plans. It then > iterates over all the stages, and if the stage contains any edges, it prints > them out. > If we want to be consistent with the statement {{Vertex dependency in root > stage}} then we should add a check to see if the stage we are processing > during the iteration is the root stage or not. > Alternatively, we could print out the edges for each stage and change the > line from {{Vertex dependency in root stage}} to {{Vertex dependency in > [stage-id]}} > I'm not sure if its possible for Hive-on-Tez to create a plan with a non-root > stage that contains edges, but it is possible for Hive-on-Spark (support > added for HoS in HIVE-11133). > Example for HoS: > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.strict.checks.cartesian.product=false; > set hive.spark.explain.user=true; > set hive.spark.dynamic.partition.pruning=true; > EXPLAIN select count(*) from srcpart where srcpart.ds in (select > max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart); > {code} > Prints > {code} > Plan optimized by CBO. > Vertex dependency in root stage > Reducer 10 <- Map 9 (GROUP) > Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP) > Reducer 13 <- Map 12 (GROUP) > Vertex dependency in root stage > Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT) > Reducer 3 <- Reducer 2 (GROUP) > Reducer 5 <- Map 4 (GROUP) > Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP) > Reducer 8 <- Map 7 (GROUP) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 3 > File Output Operator [FS_34] > Group By Operator [GBY_32] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 2 [GROUP] > GROUP [RS_31] > Group By Operator [GBY_30] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Join Operator [JOIN_28] (rows=2200 width=10) > condition > map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"} > <-Map 1 [PARTITION-LEVEL SORT] > PARTITION-LEVEL SORT [RS_26] > PartitionCols:_col0 > Select Operator [SEL_2] (rows=2000 width=10) > Output:["_col0"] > TableScan [TS_0] (rows=2000 width=10) > default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE > <-Reducer 6 [PARTITION-LEVEL SORT] > PARTITION-LEVEL SORT [RS_27] > PartitionCols:_col0 > Group By Operator [GBY_24] (rows=1 width=184) > Output:["_col0"],keys:KEY._col0 > <-Reducer 5 [GROUP] > GROUP [RS_23] > PartitionCols:_col0 > Group By Operator [GBY_22] (rows=2 width=184) > Output:["_col0"],keys:_col0 > Filter Operator [FIL_9] (rows=1 width=184) > predicate:_col0 is not null > Group By Operator [GBY_7] (rows=1 width=184) > Output:["_col0"],aggregations:["max(VALUE._col0)"] > <-Map 4 [GROUP] > GROUP [RS_6] > Group By Operator [GBY_5] (rows=1 width=184) > Output:["_col0"],aggregations:["max(ds)"] > Select Operator [SEL_4] (rows=2000 width=10) > Output:["ds"] > TableScan [TS_3] (rows=2000 width=10) > > default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE > <-Reducer 8 [GROUP] > GROUP [RS_23] > PartitionCols:_col0 > Group By Operator [GBY_22] (rows=2 width=184) > Output:["_col0"],keys:_col0 > Filter Operator [FIL_17] (rows=1 width=184) > predicate:_col0 is not null > Group By Operator [GBY_15] (rows=1 width=184) > Output:["_col0"],aggregations:["min(VALUE._col0)"] > <-Map 7 [GROUP] > GROUP [RS_14] > Group By Operator [GBY_13] (rows=1 width=184) > Output:["_col0"],aggregations:["min(ds)"] > Select Operator [SEL_12] (rows=2000 width=10) > Output:["ds"] > TableScan [TS_11] (rows=2000 width=10) > > default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE > Stage-2 > Reducer 11 > {code} > So there are two sections that say {{Vertex dependency in root stage}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)