GitHub user maryannxue opened a pull request:

    https://github.com/apache/spark/pull/21187

    [SPARK-24035][SQL] SQL syntax for Pivot

    ## What changes were proposed in this pull request?
    
    Add SQL support for Pivot according to Pivot grammar defined by Oracle 
(https://docs.oracle.com/database/121/SQLRF/img_text/pivot_clause.htm) with 
some simplifications, based on our existing functionality and limitations for 
Pivot at the backend:
    1. For pivot_for_clause 
(https://docs.oracle.com/database/121/SQLRF/img_text/pivot_for_clause.htm), the 
column list form is not supported, which means the pivot column can only be one 
single column.
    2. For pivot_in_clause 
(https://docs.oracle.com/database/121/SQLRF/img_text/pivot_in_clause.htm), the 
sub-query form and "ANY" is not supported (this is only supported by Oracle for 
XML anyway).
    3. For pivot_in_clause, aliases for the constant values are not supported.
    
    The code changes are:
    1. Add parser support for Pivot. Note that according to 
https://docs.oracle.com/database/121/SQLRF/statements_10002.htm#i2076542, Pivot 
cannot be used together with lateral views in the from clause. This restriction 
has been implemented in the Parser rule.
    2. Infer group-by expressions: group-by expressions are not explicitly 
specified in SQL Pivot clause and need to be deduced based on this rule: 
https://docs.oracle.com/database/121/SQLRF/statements_10002.htm#CHDFAFIE, so we 
have to post-fix it at query analysis stage.
    3. Override Pivot.resolved as "false": for the reason mentioned in [2] and 
the fact that output attributes change after Pivot being replaced by Project or 
Aggregate, we avoid resolving references until after Pivot has been resolved 
and replaced.
    4. Verify aggregate expressions: only aggregate expressions with or without 
aliases can appear in the first part of the Pivot clause, and this check is 
performed as analysis stage.
    
    ## How was this patch tested?
    
    A new test suite PivotSuite is added.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maryannxue/spark spark-24035

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21187.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21187
    
----
commit c486c6b15de49a519c728d037a8979791ea37e74
Author: maryannxue <maryann.xue@...>
Date:   2018-04-28T01:17:52Z

    [SPARK-24035] SQL syntax for Pivot

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to