AngersZhuuuu opened a new pull request #28836: URL: https://github.com/apache/spark/pull/28836
### What changes were proposed in this pull request? In a SELECT statement, the QUALIFY clause filters the results of window functions. QUALIFY does with window functions what HAVING does with aggregate functions and GROUP BY clauses. In the execution order of a query, QUALIFY is therefore evaluated after window functions are computed. Examples: https://docs.snowflake.com/en/sql-reference/constructs/qualify.html#examples ### Why are the changes needed? Support QUALIFY sentence. make SQL more flexiable. ### Does this PR introduce _any_ user-facing change? People can use `QUALIFY` like below: Create and load a table: ``` create table qt (i integer, p char(1), o integer); insert into qt (i, p, o) values (1, 'A', 1), (2, 'A', 2), (3, 'B', 1), (4, 'B', 2); ``` This query uses nesting rather than QUALIFY: ``` select * from ( select i, p, o, row_number() over (partition by p order by o) as row_num from qt ) where row_num = 1 ; +---+---+---+---------+ | I | P | O | ROW_NUM | |---+---+---+---------| | 1 | A | 1 | 1 | | 3 | B | 1 | 1 | +---+---+---+---------+ ``` This query uses QUALIFY: ``` select i, p, o from qt qualify row_number() over (partition by p order by o) = 1 ; +---+---+---+ | I | P | O | |---+---+---| | 1 | A | 1 | | 3 | B | 1 | +---+---+---+ ``` You can also use QUALIFY to reference window functions that are in the SELECT column list: ``` select i, p, o, row_number() over (partition by p order by o) as row_num from qt qualify row_num = 1 ; +---+---+---+---------+ | I | P | O | ROW_NUM | |---+---+---+---------| | 1 | A | 1 | 1 | | 3 | B | 1 | 1 | +---+---+---+---------+ ``` You can see how the QUALIFY acts as a filter by removing the QUALIFY from the previous query and comparing the output: ``` select i, p, o, row_number() over (partition by p order by o) as row_num from qt ; +---+---+---+---------+ | I | P | O | ROW_NUM | |---+---+---+---------| | 1 | A | 1 | 1 | | 2 | A | 2 | 2 | | 3 | B | 1 | 1 | | 4 | B | 2 | 2 | +---+---+---+---------+ ``` The QUALIFY clause can also be combined with aggregates and can have subqueries in the predicate, e.g.: ``` select c2, sum(c3) over (partition by c2) as r from t1 where c3 < 4 group by c2, c3 having sum(c1) > 3 qualify r in ( select min(c1) from test group by c2 having min(c1) > 3); ``` ### How was this patch tested? Added UT ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org