GitHub user maryannxue opened a pull request:

    https://github.com/apache/spark/pull/22519

    [SPARK-25505][SQL] The output order of grouping columns in Pivot is 
different from the input order

    ## What changes were proposed in this pull request?
    
    The grouping columns from a Pivot query are inferred as "input columns - 
pivot columns - pivot aggregate columns", where input columns are the output of 
the child relation of Pivot. The grouping columns will be the leading columns 
in the pivot output and they should preserve the same order as specified by the 
input. For example,
    ```
    SELECT * FROM (
      SELECT course, earnings, "a" as a, "z" as z, "b" as b, "y" as y, "c" as 
c, "x" as x, "d" as d, "w" as w
      FROM courseSales
    )
    PIVOT (
      sum(earnings)
      FOR course IN ('dotNET', 'Java')
    )
    ```
    The output columns should be "a, z, b, y, c, x, d, w, ..." but now it is 
"a, b, c, d, w, x, y, z, ..."
    
    The fix is to use the child plan's `output` instead of `outputSet` so that 
the order can be preserved.
    
    ## How was this patch tested?
    
    Added UT.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maryannxue/spark spark-25505

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22519.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22519
    
----
commit bd416bd74ee77329b2527fffecd21f7f90090334
Author: maryannxue <maryannxue@...>
Date:   2018-09-21T14:33:16Z

    [SPARK-25505][SQL] The output order of grouping columns in Pivot is 
different from the input order

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to