[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

liyunzhang_intel (JIRA) Sun, 18 Jun 2017 20:42:52 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053446#comment-16053446
 ]


liyunzhang_intel commented on HIVE-11297:
-----------------------------------------

[~csun]: When i print the operator tree of multi_column_single_source.q  when 
debugging in 
[SplitOpTreeForDPP|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java#L75
 ], the physical plan is 
{code}
set hive.execution.engine=spark; 
set hive.auto.convert.join.noconditionaltask.size=20; 
set hive.spark.dynamic.partition.pruning=true;
select count(*) from srcpart join srcpart_date_hour on (srcpart.ds = 
srcpart_date_hour.ds and srcpart.hr = srcpart_date_hour.hr) where 
srcpart_date_hour.`date` = '2008-04-08' and srcpart_date_hour.hour = 11;
{code}

physical plan 
{code}
TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
             -SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
             -SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{code}
{noformat}RS[4],SEL[18],SEL[21] is children of FIL[17]{noformat}
bq. I think in the original code the parent node of all branches is a filter 
op, but now it is changed
I don't think so, i think now filter op is still {noformat}FIL[17]{noformat}.  
the difference between previous is now.  Before we split above tree into three 
trees
{noformat}
tree1: TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
tree2: TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
tree3: TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{noformat}

Now we split above tree into two trees
{noformat}
tree1: TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
tree2: TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
                   -SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{noformat}

> Combine op trees for partition info generating tasks [Spark branch]
> -------------------------------------------------------------------
>
>                 Key: HIVE-11297
>                 URL: https://issues.apache.org/jira/browse/HIVE-11297
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: spark-branch
>            Reporter: Chao Sun
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

Reply via email to