Prateek Rungta created HIVE-11742:
-------------------------------------

             Summary: last_value window specifier enforces ordering as a 
partition
                 Key: HIVE-11742
                 URL: https://issues.apache.org/jira/browse/HIVE-11742
             Project: Hive
          Issue Type: Bug
          Components: PTF-Windowing
            Reporter: Prateek Rungta


[HIVE-4262|https://issues.apache.org/jira/browse/HIVE-4262] changed the 
partitioning behavior of the last_value function. For a specified last_value() 
OVER X. The ordering spec within X is used in addition to the partition spec 
for partitioning. i.e. last_value(a) OVER (PARTITION BY i ORDER BY j) operates 
last_value(a) on all rows within the unique combination of (i,j). The behavior 
I'd expect is for PARTITION BY i to define the partitioning, and ORDER BY to 
define the ordering within the PARTITION. i.e. last_value(a) OVER (PARTITION BY 
i ORDER BY j) should operate last_value(a) on all rows within the unique values 
of (i), ordered by j within the partition. 

This was changed to be consistent with how SQLServer handled such queries. 
[SQLServer 
Docs|https://msdn.microsoft.com/en-us/library/hh231517.aspx?f=255&MSPPError=-2147217396]
 describe their example (which performs as Hive does): 

{quote}
The PARTITION BY clause partitions the employees by department and the 
LAST_VALUE function is applied to each partition independently. The ORDER BY 
clause specified in the OVER clause determines the logical order in which the 
LAST_VALUE function is applied to the rows in each partition.
{quote}

To me, their behavior is inconsistent with their description. I've filled an 
[upstream bug|https://connect.microsoft.com/SQLServer/feedback/details/1753482] 
with Microsoft for the same. 

[Oracle|https://oracle-base.com/articles/misc/first-value-and-last-value-analytic-functions]
 and 
[Redshift|http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_firstlast_WF.html]
 both exhibit the behavior I'd expect.

Considering Hive-4262 has been in core for 2+ years, I don't think we can 
change the behavior without potentially impacting clients. But I would like a 
way to enable the expected behavior at the least (behind a config flag maybe?). 
What do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to