[ https://issues.apache.org/jira/browse/HIVE-24804?focusedWorklogId=611053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611053 ]
ASF GitHub Bot logged work on HIVE-24804: ----------------------------------------- Author: ASF GitHub Bot Created on: 15/Jun/21 00:09 Start Date: 15/Jun/21 00:09 Worklog Time Spent: 10m Work Description: github-actions[bot] closed pull request #2000: URL: https://github.com/apache/hive/pull/2000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 611053) Time Spent: 0.5h (was: 20m) > Introduce check: RANGE with offset PRECEDING/FOLLOWING requires exactly one > ORDER BY column > ------------------------------------------------------------------------------------------- > > Key: HIVE-24804 > URL: https://issues.apache.org/jira/browse/HIVE-24804 > Project: Hive > Issue Type: Bug > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, in Hive, we can run a windowing function with range specification > but without an ORDER BY clause: > {code} > create table vector_ptf_part_simple_text(p_mfgr string, p_name string, > p_retailprice double, rowindex string); > select p_mfgr, p_name, rowindex, > count(*) over(partition by p_mfgr range between 1 preceding and current row) > as cs1, > count(*) over(partition by p_mfgr range between 3 preceding and current row) > as cs2 > from vector_ptf_part_simple_text; > {code} > This is confusing, because without an order by clause, the range is out of > context, we don't know by which column should we calculate the range. > Tested on Postgres, it throws an exception: > {code} > create table vector_ptf_part_simple_text(p_mfgr varchar(10), p_name > varchar(10), p_retailprice integer, rowindex varchar(10)); > select p_mfgr, p_name, rowindex, > count(*) over(partition by p_mfgr range between 1 preceding and current row) > as cs1, > count(*) over(partition by p_mfgr range between 3 preceding and current row) > as cs2 > from vector_ptf_part_simple_text; > *RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column* > {code} > further references: > https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts > {code} > RANGE: Computes the window frame based on a logical range of rows around the > current row, based on the current row’s ORDER BY key value. The provided > range value is added or subtracted to the current row's key value to define a > starting or ending range boundary for the window frame. In a range-based > window frame, there must be exactly one expression in the ORDER BY clause, > and the expression must have a numeric type. > {code} > https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/window-functions-frames.html > {code} > Without ORDER BY: The default frame includes all partition rows (because, > without ORDER BY, all partition rows are peers). The default is equivalent to > this frame specification: > RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING > {code} > I believe this one could only make sense if you don't specify range, > otherwise the sql statement reflects a different thing from which is returned > by the engine -- This message was sent by Atlassian Jira (v8.3.4#803005)