Xiao Li created SPARK-13235:
-------------------------------

             Summary: Remove extra Distinct in Union Distinct
                 Key: SPARK-13235
                 URL: https://issues.apache.org/jira/browse/SPARK-13235
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.0.0
            Reporter: Xiao Li


Union Distinct has two Distinct that generates two Aggregation in the plan.

{code}
sql("select * from t0 union select * from t0").explain(true)
{code}

{code}
== Parsed Logical Plan ==
'Project [unresolvedalias(*,None)]
+- 'Subquery u_2
   +- 'Distinct
      +- 'Project [unresolvedalias(*,None)]
         +- 'Subquery u_1
            +- 'Distinct
               +- 'Union
                  :- 'Project [unresolvedalias(*,None)]
                  :  +- 'UnresolvedRelation `t0`, None
                  +- 'Project [unresolvedalias(*,None)]
                     +- 'UnresolvedRelation `t0`, None

== Analyzed Logical Plan ==
id: bigint
Project [id#16L]
+- Subquery u_2
   +- Distinct
      +- Project [id#16L]
         +- Subquery u_1
            +- Distinct
               +- Union
                  :- Project [id#16L]
                  :  +- Subquery t0
                  :     +- Relation[id#16L] ParquetRelation
                  +- Project [id#16L]
                     +- Subquery t0
                        +- Relation[id#16L] ParquetRelation

== Optimized Logical Plan ==
Aggregate [id#16L], [id#16L]
+- Aggregate [id#16L], [id#16L]
   +- Union
      :- Project [id#16L]
      :  +- Relation[id#16L] ParquetRelation
      +- Project [id#16L]
         +- Relation[id#16L] ParquetRelation
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to