[ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053446#comment-16053446 ]
liyunzhang_intel commented on HIVE-11297: ----------------------------------------- [~csun]: When i print the operator tree of multi_column_single_source.q when debugging in [SplitOpTreeForDPP|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java#L75 ], the physical plan is {code} set hive.execution.engine=spark; set hive.auto.convert.join.noconditionaltask.size=20; set hive.spark.dynamic.partition.pruning=true; select count(*) from srcpart join srcpart_date_hour on (srcpart.ds = srcpart_date_hour.ds and srcpart.hr = srcpart_date_hour.hr) where srcpart_date_hour.`date` = '2008-04-08' and srcpart_date_hour.hour = 11; {code} physical plan {code} TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12] -SEL[18]-GBY[19]-SPARKPRUNINGSINK[20] -SEL[21]-GBY[22]-SPARKPRUNINGSINK[23] {code} {noformat}RS[4],SEL[18],SEL[21] is children of FIL[17]{noformat} bq. I think in the original code the parent node of all branches is a filter op, but now it is changed I don't think so, i think now filter op is still {noformat}FIL[17]{noformat}. the difference between previous is now. Before we split above tree into three trees {noformat} tree1: TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12] tree2: TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20] tree3: TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23] {noformat} Now we split above tree into two trees {noformat} tree1: TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12] tree2: TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20] -SEL[21]-GBY[22]-SPARKPRUNINGSINK[23] {noformat} > Combine op trees for partition info generating tasks [Spark branch] > ------------------------------------------------------------------- > > Key: HIVE-11297 > URL: https://issues.apache.org/jira/browse/HIVE-11297 > Project: Hive > Issue Type: Bug > Affects Versions: spark-branch > Reporter: Chao Sun > Assignee: liyunzhang_intel > Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, > HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch > > > Currently, for dynamic partition pruning in Spark, if a small table generates > partition info for more than one partition columns, multiple operator trees > are created, which all start from the same table scan op, but have different > spark partition pruning sinks. > As an optimization, we can combine these op trees and so don't have to do > table scan multiple times. -- This message was sent by Atlassian JIRA (v6.4.14#64029)