[ https://issues.apache.org/jira/browse/SPARK-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035941#comment-14035941 ]
Yin Huai commented on SPARK-2176: --------------------------------- OK. Let me explain the cause of this bug. When we create a execution.ExplainCommand, we use the executedPlan as the child of this ExplainCommand. But, this executedPlan is prepared for execution again when we generate the executedPlan for the ExplainCommand. Basically, prepareForExecution is called twice on a physical plan. Because after prepareForExecution we have already bound those references (in BoundReferences), AddExchange cannot figure out we are using the same partitioning (we use AttributeReferences to create an ExchangeOperator and then those references will be changed to BoundReferences after prepareForExecution is called). So, an extra ExchangeOperator is inserted. I think in CommandStrategy, we should just use the sparkPlan (sparkPlan is the input of prepareForExecution) to initialize the ExplainCommand instead of using executedPlan. > extra unnecessary exchange operator in group by > ----------------------------------------------- > > Key: SPARK-2176 > URL: https://issues.apache.org/jira/browse/SPARK-2176 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: Reynold Xin > Assignee: Yin Huai > > {code} > hql("explain select * from src group by key").collect().foreach(println) > [ExplainCommand [plan#27:0]] > [ Aggregate false, [key#25], [key#25,value#26]] > [ Exchange (HashPartitioning [key#25:0], 200)] > [ Exchange (HashPartitioning [key#25:0], 200)] > [ Aggregate true, [key#25], [key#25]] > [ HiveTableScan [key#25,value#26], (MetastoreRelation default, src, > None), None] > {code} > There are two exchange operators. > However, if we do not use explain... > {code} > hql("select * from src group by key") > res4: org.apache.spark.sql.SchemaRDD = > SchemaRDD[8] at RDD at SchemaRDD.scala:100 > == Query Plan == > Aggregate false, [key#8], [key#8,value#9] > Exchange (HashPartitioning [key#8:0], 200) > Aggregate true, [key#8], [key#8] > HiveTableScan [key#8,value#9], (MetastoreRelation default, src, None), None > {code} > The plan is fine. -- This message was sent by Atlassian JIRA (v6.2#6252)