[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-05-03 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863415#action_12863415
 ] 

John Sichi commented on HIVE-259:
-

PERCENTILE docs are still missing on the consolidated page:

http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Fix For: 0.6.0

 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-04-22 Thread John Sichi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860061#action_12860061
 ] 

John Sichi commented on HIVE-259:
-

I couldn't see the point of having two competing UDF guide pages, so I renamed 
the XPath-specific one as such and linked it from the main one.  Just 
housekeeping to reduce confusion; I did not actually add the percentile info.


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Fix For: 0.6.0

 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-04-19 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858600#action_12858600
 ] 

Ning Zhang commented on HIVE-259:
-

Hi Jerome and Zheng, 

Could any of you write the syntax and semantics of the percentile function in 
the wiki page (http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF or 
http://wiki.apache.org/hadoop/Hive/HiveUDFGuide)?

Thanks,

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Fix For: 0.6.0

 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839391#action_12839391
 ] 

He Yongqiang commented on HIVE-259:
---

The code looks very good. Thanks for the code work, Jerome and Zheng!
Just some minor comments:
(1) I am not familiar with the exact definition of percentile function. Is the 
percentile()'s result must be a member of input data? 
(2) HashMap and ArrayList is used to copy and sort. Can we use tree map here? 
this is a small and can be ignored.
In the beginning of  new test case, 
DESCRIBE FUNCTION percentile;
DESCRIBE FUNCTION EXTENDED percentile;
appears two times.

And this is a very good function to have, it will be great if we can update its 
usage to the wiki page or somewhere.

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-28 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839393#action_12839393
 ] 

Zheng Shao commented on HIVE-259:
-

 (1) I am not familiar with the exact definition of percentile function. Is 
 the percentile()'s result must be a member of input data?
See the link above.

 (2) HashMap and ArrayList is used to copy and sort. Can we use tree map here? 
 this is a small and can be ignored.
In the beginning of new test case, 
I think HashMap is better here. The reason is that the number of iterate is 
usually much higher than the number of unique numbers (the size of the 
HashMap). By using HashMap we reduce the cost of iterate.

 In the beginning of new test case, .. appears two times
Fixed in HIVE-259.5.patch


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-28 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839394#action_12839394
 ] 

He Yongqiang commented on HIVE-259:
---

looks good, will test and commit.

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, 
 HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-25 Thread Jerome Boulon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838512#action_12838512
 ] 

Jerome Boulon commented on HIVE-259:


Can someone explain how can I create/populate a new table to be used by the ant 
test target?


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-25 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838516#action_12838516
 ] 

Carl Steinbach commented on HIVE-259:
-

@Jerome: take a look at ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-25 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838718#action_12838718
 ] 

Zheng Shao commented on HIVE-259:
-

Hi Jerome, using ArrayListInteger won't cause unnecessary Object creation. We 
will just create a single ArrayListInteger and use it forever.
Does that make sense?


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-25 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838735#action_12838735
 ] 

Todd Lipcon commented on HIVE-259:
--

Doesn't the autoboxing of Integer types actually allocate objects? I think JVM 
only flyweights integers for very small ones (iirc only from -127 to 128)

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-24 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838118#action_12838118
 ] 

Zheng Shao commented on HIVE-259:
-

Also see http://wiki.apache.org/hadoop/Hive/HowToContribute#Coding_Convention

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-24 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838119#action_12838119
 ] 

Zheng Shao commented on HIVE-259:
-

The test cases looks a bit too trivial or the results have problems? They 
always return the same number for the 3 different percentile values.


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-24 Thread Jerome Boulon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838173#action_12838173
 ] 

Jerome Boulon commented on HIVE-259:


- From my point of view, changing variable access to private in the state 
object will not make the code more readable ...
- I'll change all variables to be lowerCase to match java style, current 
variable's name are based on Oracle definition.

@Zheng - I'm not using an ArrayListInteger but a String to avoid unnecessary 
object creation (for every single row) ... would even be better if the 
constructor could have been used but I haven't found how to do that. If we care 
about 1 extra empty arrayList per mapper/spill in memory then we should care 
about creating (1 ArrayList + 1 Integer Object per percentile) per row.

@Zheng - Regarding the test case that what I add in mind when I asked you, 
howto create my own table and that exactly the reason why I post Jb2.* files


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-23 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837500#action_12837500
 ] 

Carl Steinbach commented on HIVE-259:
-

Please fix the new Checkstyle errors in UDAFPercentile.java:

35: Missing a Javadoc comment.
39: Missing a Javadoc comment.
39:10: 'public' modifier out of order with the JLS suggestions.
41: Missing a Javadoc comment.
41:12: 'public' modifier out of order with the JLS suggestions.
42:15: Variable 'initDone' must be private and have accessor methods.
43:7: Declaring variables, return values or parameters of type 'HashMap' is not 
allowed.
43:35: Variable 'counts' must be private and have accessor methods.
44:7: Declaring variables, return values or parameters of type 'ArrayList' is 
not allowed.
44:26: Variable 'percentiles' must be private and have accessor methods.
47: Missing a Javadoc comment.
47:12: 'public' modifier out of order with the JLS suggestions.
56:11: Variable 'state' must be private and have accessor methods.
82:43: Name '_percentiles' must match pattern '^[a-z][a-zA-Z0-9]*$'.
85:28: Expression can be simplified.
105:39: ')' is preceded with whitespace.
117:26: Expression can be simplified.
125:65: Name 'RN' must match pattern '^[a-z][a-zA-Z0-9]*$'.
129:12: Name 'CRN' must match pattern '^[a-z][a-zA-Z0-9]*$'.
130:12: Name 'FRN' must match pattern '^[a-z][a-zA-Z0-9]*$'.
164:12: Declaring variables, return values or parameters of type 'ArrayList' is 
not allowed.
173: Line is longer than 100 characters.
184:7: Declaring variables, return values or parameters of type 'ArrayList' is 
not allowed.
188:12: Name 'N' must match pattern '^[a-z][a-zA-Z0-9]*$'.
189:14: Name 'RN' must match pattern '^[a-z][a-zA-Z0-9]*$'.
191:16: Name 'P' must match pattern '^[a-z][a-zA-Z0-9]*$'.


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-23 Thread Jerome Boulon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837522#action_12837522
 ] 

Jerome Boulon commented on HIVE-259:


@Carl: How did you get this list?

Also, I'm not sure to understand this: 

Why HashMap and ArrayList are not allowed if supported??

43:7: Declaring variables, return values or parameters of type 'HashMap' is not 
allowed.
44:7: Declaring variables, return values or parameters of type 'ArrayList' is 
not allowed.
164:12: Declaring variables, return values or parameters of type 'ArrayList' is 
not allowed.
184:7: Declaring variables, return values or parameters of type 'ArrayList' is 
not allowed.


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-23 Thread Alex Loddengaard (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837526#action_12837526
 ] 

Alex Loddengaard commented on HIVE-259:
---

Hey Jerome,

I assume it's because you're supposed to use the interface type (e.g. Map or 
List) for return types, parameter types, and declaring variables.

Correct me if I'm wrong, those of you more knowledgeable about Hive's 
checkstyle :).

Alex

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-23 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837527#action_12837527
 ] 

Carl Steinbach commented on HIVE-259:
-

bq. How did you get this list? 

Run 'ant checkstyle'. The list of violations gets dumped to 
build/checkstyle/checkstyle-errors.txt.

bq. Why HashMap and ArrayList are not allowed if supported?

You're allowed to use ArrayList and HashMap, but you're supposed to refer
to instances of these classes using the interface (List or Map) instead of the
concrete type, e.g.

{code:java}
MapString, String myMap = new HashMapString, String();

public ListString getStringList() {
   return new ArrayListString();
}
{code}



 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-16 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834474#action_12834474
 ] 

Zheng Shao commented on HIVE-259:
-

 Is there any limitation on what can be used on the state object or can we use 
 any java Object? 
We support primitive classes, HashMap (translated into map type in Hive), 
ArrayList (array type in Hive), and any simple struct-like classes (struct type 
in Hive).
We support arbitrary levels of nesting, but no recursive types.

 Also how is the state serialized between Map and Reduce?
We use SerDe (see SerDe.serialize(...) ) to serialize/deserialize the objects, 
as well as translations between objects that have the same type (see 
ObjectInspector and ObjectInspectorConverters).


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259.1.patch, HIVE-259.patch


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-10 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832134#action_12832134
 ] 

Zheng Shao commented on HIVE-259:
-

Jerome, it seems to me that the best data structure for counting is a HashMap, 
which allows near-constant-time insertion, find, and insertion. When we 
terminate we can get the entries and sort them but that cost should be small 
(it's one-time cost and the number of unique items won't be too big - users 
should have used round to shrink the number of unique numbers).

It seems currently we are paying log(n) cost for each find, and O(n) cost for 
each insertion.

Does that make sense?

For sharing the state object, we can just declare the state class as public 
static.


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259.patch


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832139#action_12832139
 ] 

Todd Lipcon commented on HIVE-259:
--

Agreed re HashMap. Also, there should be some kind of setting that limits how 
much RAM gets used up. In a later iteration we could do adaptive histogramming 
once we hit the limit. In this version we should just throw up our hands and 
fail with a message that says the user needs to discretize harder.

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259.patch


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-10 Thread Jerome Boulon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832146#action_12832146
 ] 

Jerome Boulon commented on HIVE-259:


Didn't know that we can use an Hash on the state Object ...
Is there any limitation on what can be used on the state object or can we use 
any java Object? 
Also how is the state serialized between Map and Reduce?

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259.patch


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-01-28 Thread Jerome Boulon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806134#action_12806134
 ] 

Jerome Boulon commented on HIVE-259:


It will also be good to be able to ask for more than one PERCENTILE(column, 
.99) with only one single structure in memory
ex: select PERCENTILE(column, .99), PERCENTILE(column, .50) from myTable;


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer

 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-01-28 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806183#action_12806183
 ] 

Carl Steinbach commented on HIVE-259:
-


@Jerome: Agreed. Allowing sort results to be shared by multiple functions (like 
in the following example) is key to supporting analytic functions efficiently.

{code:sql}
SELECT department_id,
   PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary DESC) 
  Median cont,
   PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC) 
  Median disc
   FROM employees GROUP BY department_id;
{code}

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer

 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-01-19 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802616#action_12802616
 ] 

Zheng Shao commented on HIVE-259:
-

This is a good first step. We can provide some UDFs to bucketize the values 
first in case the user needs it.


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer

 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2009-11-24 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781824#action_12781824
 ] 

Carl Steinbach commented on HIVE-259:
-

This would be a very useful function to have.

For the sake of completeness (and without much additional effort) it would be 
nice to provide both PERCENTILE_DISC and PERCENTILE_CONT.

PERCENTILE_CONT: 
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions110.htm
PERCENTILE_DISC:  
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions111.htm


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer

 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2009-11-24 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782120#action_12782120
 ] 

Todd Lipcon commented on HIVE-259:
--

An easy way to do this that would work for a ton of data sets would to be 
essentially do counting sort. If you have only a few thousand distinct values 
in the column to be analyzed, just make a hashtable, count up how many you see, 
and then in the single reducer use the histogram to figure out the percentile. 
This should work great for datasets like age, and even for sets like number of 
days since user signed up. For sets that are truly continuous, would be useful 
when combined with a binning UDF to discretize it.

Sadly it's not general case, but would be an easy first step.

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer

 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2009-01-29 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668699#action_12668699
 ] 

Edward Capriolo commented on HIVE-259:
--

95% percentile is very often used in Internet Service Provider billing that 
might be useful. 

The percentile calculation is a sort and then picking an element. The syntax 
could be like:

* PERCENTILE(column, .99) 
* PERCENTILE(column, .50)

In this manner you could do any percentile.

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer

 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.