[jira] [Commented] (SPARK-2176) extra unnecessary exchange operator in group by

Yin Huai (JIRA) Wed, 18 Jun 2014 09:59:26 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035941#comment-14035941
 ]


Yin Huai commented on SPARK-2176:
---------------------------------

OK. Let me explain the cause of this bug.

When we create a execution.ExplainCommand, we use the executedPlan as the child 
of this ExplainCommand. But, this executedPlan is prepared for execution again 
when we generate the executedPlan for the ExplainCommand. Basically, 
prepareForExecution is called twice on a physical plan. Because after 
prepareForExecution we have already bound those references (in 
BoundReferences), AddExchange cannot figure out we are using the same 
partitioning (we use AttributeReferences to create an ExchangeOperator and then 
those references will be changed to BoundReferences after prepareForExecution 
is called). So, an extra ExchangeOperator is inserted.

I think in CommandStrategy, we should just use the sparkPlan (sparkPlan is the 
input of prepareForExecution) to initialize the ExplainCommand instead of using 
executedPlan.

> extra unnecessary exchange operator in group by
> -----------------------------------------------
>
>                 Key: SPARK-2176
>                 URL: https://issues.apache.org/jira/browse/SPARK-2176
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Reynold Xin
>            Assignee: Yin Huai
>
> {code}
> hql("explain select * from src group by key").collect().foreach(println)
> [ExplainCommand [plan#27:0]]
> [ Aggregate false, [key#25], [key#25,value#26]]
> [  Exchange (HashPartitioning [key#25:0], 200)]
> [   Exchange (HashPartitioning [key#25:0], 200)]
> [    Aggregate true, [key#25], [key#25]]
> [     HiveTableScan [key#25,value#26], (MetastoreRelation default, src, 
> None), None]
> {code}
> There are two exchange operators.
> However, if we do not use explain...
> {code}
> hql("select * from src group by key")
> res4: org.apache.spark.sql.SchemaRDD = 
> SchemaRDD[8] at RDD at SchemaRDD.scala:100
> == Query Plan ==
> Aggregate false, [key#8], [key#8,value#9]
>  Exchange (HashPartitioning [key#8:0], 200)
>   Aggregate true, [key#8], [key#8]
>    HiveTableScan [key#8,value#9], (MetastoreRelation default, src, None), None
> {code}
> The plan is fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2176) extra unnecessary exchange operator in group by

Reply via email to