[ https://issues.apache.org/jira/browse/SPARK-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xin Wu updated SPARK-14361: --------------------------- Description: The current Spark SQL does not support the {code}exclude{code} clause in Window function framing clause, which is part of ANSI SQL2003's (was: The current Spark SQL does not support the `exclusion` clause, which is part of ANSI SQL2003’s `Window` syntax. For example, IBM Netezza fully supports it as shown in the [document web link] (https://www.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.nz.dbu.doc/c_dbuser_wi ndow_aggregation_family_syntax.html). We propose to support it in this JIRA. ##### Introduction Below is the ANSI SQL2003’s `Window` syntax: ``` FUNCTION_NAME(expr) OVER {window_name | (window_specification)} window_specification ::= [window_name] [partitioning] [ordering] [framing] partitioning ::= PARTITION BY value[, value...] [COLLATE collation_name] ordering ::= ORDER [SIBLINGS] BY rule[, rule...] rule ::= {value | position | alias} [ASC | DESC] [NULLS {FIRST | LAST}] framing ::= {ROWS | RANGE} {start | between} [exclusion] start ::= {UNBOUNDED PRECEDING | unsigned-integer PRECEDING | CURRENT ROW} between ::= BETWEEN bound AND bound bound ::= {start | UNBOUNDED FOLLOWING | unsigned-integer FOLLOWING} exclusion ::= {EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES | EXCLUDE NO OTHERS} ``` Exclusion clause can be used to excluded certain rows from the window framing when calculating window aggregation function (e.g. AVG, SUM, MAX, MIN, COUNT, etc) related to current row. Types of window functions that are not supported are listed below: 1. Offset functions, such as lead(), lag() 2. Ranking functions, such as rank(), dense_rank(), percent_rank(), cume_dist, ntile() 3. Row number function, such as row_number() ##### Definition Syntax | Description ------------ | ------------- EXCLUDE CURRENT ROW | Specifies excluding the current row. EXCLUDE GROUP | Specifies excluding the current row and all rows that are tied with it. Ties occur when there is a match on the order column or columns. EXCLUDE NO OTHERS | Specifies not excluding any rows. This value is the default if you specify no exclusion. EXCLUDE TIES | Specifies excluding all rows that are tied with the current row (peer rows), but retaining the current row. ##### Use-case Examples: - Let's say you want to find out for every employee, where is his/her salary at compared to the average salary of those within the same department and whose ages are within 5 years younger and older. The query could be: ```SQL SELECT NAME, DEPT_ID, SALARY, AGE, AVG(SALARY) AS AVG_WITHIN_5_YEAR OVER(PARTITION BY DEPT_ID ORDER BY AGE RANGE BETWEEN 5 PRECEDING AND 5 FOLLOWING EXCLUDE CURRENT ROW) FROM EMPLOYEE ``` - Let's say you want to compare every customer's yearly purchase with other customers' average yearly purchase who are at different age group from the current customer. The query could be: ```SQL SELECT CUST_NAME, AGE, PROD_CATEGORY, YEARLY_PURCHASE, AVG(YEARLY_PURCHASE) OVER(PARTITION BY PROD_CATEGORY ORDER BY AGE RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUND FOLLOWING EXCLUDE GROUP) FROM CUSTOMER_PURCHASE_SUM ```) > Support EXCLUDE clause in Window function framing > ------------------------------------------------- > > Key: SPARK-14361 > URL: https://issues.apache.org/jira/browse/SPARK-14361 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.0 > Reporter: Xin Wu > > The current Spark SQL does not support the {code}exclude{code} clause in > Window function framing clause, which is part of ANSI SQL2003's -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org