[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

viirya Thu, 10 May 2018 22:24:16 -0700

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/21291
  
    Changed `outputPartitioning` changes executed plans.
    
    E.g. in `WholeStageCodegenSuite`, a query like 
`spark.range(3).groupBy("id").count().orderBy("id")`. Its executed plan changes 
from
    
    ```
    *(3) Sort [id#22L ASC NULLS FIRST], true, 0
    +- Exchange rangepartitioning(id#22L ASC NULLS FIRST, 5)
       +- *(2) HashAggregate(keys=[id#22L], functions=[count(1)], 
output=[id#22L, count#26L])
          +- Exchange hashpartitioning(id#22L, 5)
             +- *(1) HashAggregate(keys=[id#22L], functions=[partial_count(1)], 
output=[id#22L, count#31L])
                +- *(1) Range (0, 3, step=1, splits=2)
    ```
    to
    
    ```
    *(1) Sort [id#22L ASC NULLS FIRST], true, 0
    +- *(1) HashAggregate(keys=[id#22L], functions=[count(1)], output=[id#22L, 
count#26L])
       +- *(1) HashAggregate(keys=[id#22L], functions=[partial_count(1)], 
output=[id#22L, count#31L])
          +- *(1) Range (0, 3, step=1, splits=2)
    ```
    
    I will update related tests.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21291: [SPARK-24242][SQL] RangeExec should have correct outputO...

Reply via email to