[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

liyunzhang_intel (JIRA) Tue, 06 Jun 2017 01:58:28 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038414#comment-16038414
 ]


liyunzhang_intel commented on HIVE-11297:
-----------------------------------------

[~csun]:   we can not do that because 
GenSparkProcContext#clonedPruningTableScanSet will be sent to topNodes of 
GenSparkWorkWalker#startWalking. And GenSparkWorkWalker will split tree in min 
cost. So if topNode is 1, it will split following tree
{noformat}
TS[1]-FIL[17]- SEL[18] -GBY[19]-SPARKPRUNINGSINK[20]
                    -SEL[21] -GBY[22]-SPARKPRUNINGSINK[23]
{noformat}
into  only 1 tree
{noformat}
TS[1]-FIL[17]- SEL[18] -GBY[19]-SPARKPRUNINGSINK[20]
{noformat}

The log of GenSparkWork
{code}
[root@bdpe41 hive]# grep GenSparkWork logs/hive.log 
2017-06-06T16:34:12,527 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Root operator: TS[0]
2017-06-06T16:34:12,527 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Leaf operator: RS[2]
2017-06-06T16:34:19,070 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: First pass. Leaf operator: RS[2]
2017-06-06T16:34:19,070 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Root operator: JOIN[5]
2017-06-06T16:34:19,070 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Leaf operator: RS[9]
2017-06-06T16:34:22,858 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Removing RS[2] as parent from JOIN[5]
2017-06-06T16:34:22,859 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Removing RS[4] as parent from JOIN[5]
2017-06-06T16:34:22,859 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: First pass. Leaf operator: RS[9]
2017-06-06T16:34:22,859 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Root operator: GBY[10]
2017-06-06T16:34:22,859 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Leaf operator: FS[12]
2017-06-06T16:34:27,322 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Removing RS[9] as parent from GBY[10]
2017-06-06T16:34:27,322 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: First pass. Leaf operator: FS[12]
2017-06-06T16:34:27,322 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Root operator: TS[1]
2017-06-06T16:34:27,322 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Leaf operator: RS[4]
2017-06-06T16:36:14,669 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Second pass. Leaf operator: RS[4] has common downstream 
work:org.apache.hadoop.hive.ql.plan.ReduceWork@7e7f72
2017-06-06T16:36:14,672 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Root operator: TS[1]
2017-06-06T16:36:14,672 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: Leaf operator: SPARKPRUNINGSINK[20]
2017-06-06T16:38:22,338 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] 
spark.GenSparkWork: First pass. Leaf operator: SPARKPRUNINGSINK[20]
{code}


> Combine op trees for partition info generating tasks [Spark branch]
> -------------------------------------------------------------------
>
>                 Key: HIVE-11297
>                 URL: https://issues.apache.org/jira/browse/HIVE-11297
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: spark-branch
>            Reporter: Chao Sun
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

Reply via email to