[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782120#action_12782120
 ] 

Todd Lipcon commented on HIVE-259:
----------------------------------

An easy way to do this that would work for a ton of data sets would to be 
essentially do counting sort. If you have only a few thousand distinct values 
in the column to be analyzed, just make a hashtable, count up how many you see, 
and then in the single reducer use the histogram to figure out the percentile. 
This should work great for datasets like age, and even for sets like "number of 
days since user signed up". For sets that are truly continuous, would be useful 
when combined with a binning UDF to discretize it.

Sadly it's not general case, but would be an easy first step.

> Add PERCENTILE aggregate function
> ---------------------------------
>
>                 Key: HIVE-259
>                 URL: https://issues.apache.org/jira/browse/HIVE-259
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Venky Iyer
>
> Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to