[ 
https://issues.apache.org/jira/browse/HIVE-16255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792824#comment-16792824
 ] 

Laszlo Bodor commented on HIVE-16255:
-------------------------------------

discussed with [~kgyrtkirk] again, due to HIVE-21338, the aggregation function 
cannot rely on the fact that arriving elements are ordered in an aggregate, so 
the original implementation (05.patch) would make do (sorting manually on 
terminate)
additionally, I refactored it in 06.patch to eliminate code duplications 
(LongWritable/DoubleWritable)

WITHIN GROUP followup is: HIVE-21449

cc: [~ashutoshc], [~mgergely]

> Support percentile_cont / percentile_disc
> -----------------------------------------
>
>                 Key: HIVE-16255
>                 URL: https://issues.apache.org/jira/browse/HIVE-16255
>             Project: Hive
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Carter Shanklin
>            Assignee: Laszlo Bodor
>            Priority: Major
>         Attachments: HIVE-16255.01.patch, HIVE-16255.02.patch, 
> HIVE-16255.03.patch, HIVE-16255.04.patch, HIVE-16255.05.patch, 
> HIVE-16255.06.patch
>
>
> Way back in HIVE-259, a percentile function was added that provides a subset 
> of the standard percentile_cont aggregate function.
> The SQL standard provides some additional options and also a percentile_disc 
> aggregate function with different rules. In the standard you specify an 
> ordering with arbitrary value expression and the results are drawn from this 
> value expression. This aggregate functions should be usable as analytic 
> functions as well (i.e. support the over clause). The current percentile 
> function is able to be used with an over clause.
> The rough outline of how this works is:
> percentile_cont(number) within group (order by expression) [ over(window 
> spec) ]
> percentile_disc(number) within group (order by expression) [ over(window 
> spec) ]
> The value of number should be between 0 and 1. The value expression is 
> evaluated for each row of the group, nulls are discarded, and the remaining 
> rows are ordered.
> — If PERCENTILE_CONT is specified, by considering the pair of consecutive 
> rows that are indicated by the argument, treated as a fraction of the total 
> number of rows in the group, and interpolating the value of the value 
> expression evaluated for these rows.
> — If PERCENTILE_DISC is specified, by treating the group as a window 
> partition of the CUME_DIST window function, using the specified ordering of 
> the value expression as the window ordering, and returning the  first value 
> expression whose cumulative distribution value is greater than or equal to 
> the argument.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to