[GitHub] [spark] AngersZhuuuu opened a new pull request #28836: [SPARK-31561][SQL] Add QUALIFY Clause

GitBox Mon, 15 Jun 2020 20:23:16 -0700


AngersZhuuuu opened a new pull request #28836:
URL: https://github.com/apache/spark/pull/28836



   
   ### What changes were proposed in this pull request?
   In a SELECT statement, the QUALIFY clause filters the results of window 
functions.
   QUALIFY does with window functions what HAVING does with aggregate functions 
and GROUP BY clauses.
   In the execution order of a query, QUALIFY is therefore evaluated after 
window functions are computed.
   Examples:
   https://docs.snowflake.com/en/sql-reference/constructs/qualify.html#examples
   
   ### Why are the changes needed?
   Support  QUALIFY sentence.  make SQL more flexiable.
   
   
   ### Does this PR introduce _any_ user-facing change?
   People can use `QUALIFY` like below:
   Create and load a table:
   ```
   create table qt (i integer, p char(1), o integer);
   insert into qt (i, p, o) values
       (1, 'A', 1),
       (2, 'A', 2),
       (3, 'B', 1),
       (4, 'B', 2);
   ```
   This query uses nesting rather than QUALIFY:
   ```
   select * 
       from (
            select i, p, o, 
                   row_number() over (partition by p order by o) as row_num
               from qt
           )
       where row_num = 1
       ;
   +---+---+---+---------+
   | I | P | O | ROW_NUM |
   |---+---+---+---------|
   | 1 | A | 1 |       1 |
   | 3 | B | 1 |       1 |
   +---+---+---+---------+
   ```
   This query uses QUALIFY:
   ```
   select i, p, o
       from qt
       qualify row_number() over (partition by p order by o) = 1
       ;
   +---+---+---+
   | I | P | O |
   |---+---+---|
   | 1 | A | 1 |
   | 3 | B | 1 |
   +---+---+---+
   ```
   You can also use QUALIFY to reference window functions that are in the 
SELECT column list:
   ```
   select i, p, o, row_number() over (partition by p order by o) as row_num
       from qt
       qualify row_num = 1
       ;
   +---+---+---+---------+
   | I | P | O | ROW_NUM |
   |---+---+---+---------|
   | 1 | A | 1 |       1 |
   | 3 | B | 1 |       1 |
   +---+---+---+---------+
   ```
   You can see how the QUALIFY acts as a filter by removing the QUALIFY from 
the previous query and comparing the output:
   ```
   select i, p, o, row_number() over (partition by p order by o) as row_num
       from qt
       ;
   +---+---+---+---------+
   | I | P | O | ROW_NUM |
   |---+---+---+---------|
   | 1 | A | 1 |       1 |
   | 2 | A | 2 |       2 |
   | 3 | B | 1 |       1 |
   | 4 | B | 2 |       2 |
   +---+---+---+---------+
   ```
   The QUALIFY clause can also be combined with aggregates and can have 
subqueries in the predicate, e.g.:
   ```
   select c2, sum(c3) over (partition by c2) as r
     from t1
     where c3 < 4
     group by c2, c3
     having sum(c1) > 3
     qualify r in (
       select min(c1)
         from test
         group by c2
         having min(c1) > 3);
   ```
   
   ### How was this patch tested?
   Added UT
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu opened a new pull request #28836: [SPARK-31561][SQL] Add QUALIFY Clause

Reply via email to