[jira] [Work logged] (HIVE-24804) Introduce check: RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column

ASF GitHub Bot (Jira) Mon, 14 Jun 2021 17:10:12 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-24804?focusedWorklogId=611053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611053
 ]


ASF GitHub Bot logged work on HIVE-24804:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Jun/21 00:09
            Start Date: 15/Jun/21 00:09
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] closed pull request #2000:
URL: https://github.com/apache/hive/pull/2000


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 611053)
    Time Spent: 0.5h  (was: 20m)

> Introduce check: RANGE with offset PRECEDING/FOLLOWING requires exactly one 
> ORDER BY column
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-24804
>                 URL: https://issues.apache.org/jira/browse/HIVE-24804
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, in Hive, we can run a windowing function with range specification 
> but without an ORDER BY clause:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr string, p_name string, 
> p_retailprice double, rowindex string);
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row) 
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row) 
> as cs2
> from vector_ptf_part_simple_text;
> {code}
> This is confusing, because without an order by clause, the range is out of 
> context, we don't know by which column should we calculate the range.
> Tested on Postgres, it throws an exception:
> {code}
> create table vector_ptf_part_simple_text(p_mfgr varchar(10), p_name 
> varchar(10), p_retailprice integer, rowindex varchar(10));
> select p_mfgr, p_name, rowindex,
> count(*) over(partition by p_mfgr range between 1 preceding and current row) 
> as cs1,
> count(*) over(partition by p_mfgr range between 3 preceding and current row) 
> as cs2
> from vector_ptf_part_simple_text;
> *RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column*
> {code}
> further references:
> https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts
> {code}
> RANGE: Computes the window frame based on a logical range of rows around the 
> current row, based on the current row’s ORDER BY key value. The provided 
> range value is added or subtracted to the current row's key value to define a 
> starting or ending range boundary for the window frame. In a range-based 
> window frame, there must be exactly one expression in the ORDER BY clause, 
> and the expression must have a numeric type.
> {code}
> https://docs.oracle.com/cd/E17952_01/mysql-8.0-en/window-functions-frames.html
> {code}
> Without ORDER BY: The default frame includes all partition rows (because, 
> without ORDER BY, all partition rows are peers). The default is equivalent to 
> this frame specification:
> RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
> {code}
> I believe this one could only make sense if you don't specify range, 
> otherwise the sql statement reflects a different thing from which is returned 
> by the engine



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24804) Introduce check: RANGE with offset PRECEDING/FOLLOWING requires exactly one ORDER BY column

Reply via email to