Re: computing median and percentiles
the short answer is there is no native hive UDF that solves your unique case. That means you have to solve it. i searched for something like you were looking for myself and found this general recipe: http://www.onlinemathlearning.com/median-frequency-table.html off the top of my head i'm not sure how easy this would be in SQL but i imagine using ROW_NUMBER() function a clever person could do it. And if not SQL then perhaps a custom UDF but ultimately you have to do the work and now you have a potential recipe to follow. :) On Wed, Mar 19, 2014 at 9:37 PM, Seema Datar sda...@yahoo-inc.com wrote: Not really. If it was a single column with no counters, Hive provides an option to use percentile. So basically if the data was like - 100 100 200 200 200 200 300 But if we have 2 columns, one that maintain the value and the other that maintains the count, how can Hive be used to derive the percentile? Value Count 100 2 200 4 300 1 Thanks, Seema From: Stephen Sprague sprag...@gmail.com Reply-To: user@hive.apache.org user@hive.apache.org Date: Thursday, March 20, 2014 5:28 AM To: user@hive.apache.org user@hive.apache.org Subject: Re: computing median and percentiles not a hive question is it? its more like a math question. On Wed, Mar 19, 2014 at 1:30 PM, Seema Datar sda...@yahoo-inc.com wrote: I understand the percentile function is supported in Hive in the latest versions. However, how does once calculate percentiles when the data is across two columns. So say - Value Count 100 2 ( so basically 100 occurred twice) 200 4 300 1 400 6 500 3 I want to find out the 0.25 percentile for the value distribution. How can I do it using the Hive percentile function?
Re: computing median and percentiles
I understand the percentile function is supported in Hive in the latest versions. However, how does once calculate percentiles when the data is across two columns. So say - Value Count 100 2 ( so basically 100 occurred twice) 200 4 300 1 400 6 500 3 I want to find out the 0.25 percentile for the value distribution. How can I do it using the Hive percentile function?
Re: computing median and percentiles
not a hive question is it? its more like a math question. On Wed, Mar 19, 2014 at 1:30 PM, Seema Datar sda...@yahoo-inc.com wrote: I understand the percentile function is supported in Hive in the latest versions. However, how does once calculate percentiles when the data is across two columns. So say - Value Count 100 2 ( so basically 100 occurred twice) 200 4 300 1 400 6 500 3 I want to find out the 0.25 percentile for the value distribution. How can I do it using the Hive percentile function?
Re: computing median and percentiles
Not really. If it was a single column with no counters, Hive provides an option to use percentile. So basically if the data was like - 100 100 200 200 200 200 300 But if we have 2 columns, one that maintain the value and the other that maintains the count, how can Hive be used to derive the percentile? Value Count 100 2 200 4 300 1 Thanks, Seema From: Stephen Sprague sprag...@gmail.commailto:sprag...@gmail.com Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Thursday, March 20, 2014 5:28 AM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Re: computing median and percentiles not a hive question is it? its more like a math question. On Wed, Mar 19, 2014 at 1:30 PM, Seema Datar sda...@yahoo-inc.commailto:sda...@yahoo-inc.com wrote: I understand the percentile function is supported in Hive in the latest versions. However, how does once calculate percentiles when the data is across two columns. So say - Value Count 100 2 ( so basically 100 occurred twice) 200 4 300 1 400 6 500 3 I want to find out the 0.25 percentile for the value distribution. How can I do it using the Hive percentile function?