[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863415#action_12863415 ] John Sichi commented on HIVE-259: - PERCENTILE docs are still missing on the consolidated page: http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Fix For: 0.6.0 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860061#action_12860061 ] John Sichi commented on HIVE-259: - I couldn't see the point of having two competing UDF guide pages, so I renamed the XPath-specific one as such and linked it from the main one. Just housekeeping to reduce confusion; I did not actually add the percentile info. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Fix For: 0.6.0 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12858600#action_12858600 ] Ning Zhang commented on HIVE-259: - Hi Jerome and Zheng, Could any of you write the syntax and semantics of the percentile function in the wiki page (http://wiki.apache.org/hadoop/Hive/LanguageManual/UDF or http://wiki.apache.org/hadoop/Hive/HiveUDFGuide)? Thanks, Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Fix For: 0.6.0 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839391#action_12839391 ] He Yongqiang commented on HIVE-259: --- The code looks very good. Thanks for the code work, Jerome and Zheng! Just some minor comments: (1) I am not familiar with the exact definition of percentile function. Is the percentile()'s result must be a member of input data? (2) HashMap and ArrayList is used to copy and sort. Can we use tree map here? this is a small and can be ignored. In the beginning of new test case, DESCRIBE FUNCTION percentile; DESCRIBE FUNCTION EXTENDED percentile; appears two times. And this is a very good function to have, it will be great if we can update its usage to the wiki page or somewhere. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.4.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839393#action_12839393 ] Zheng Shao commented on HIVE-259: - (1) I am not familiar with the exact definition of percentile function. Is the percentile()'s result must be a member of input data? See the link above. (2) HashMap and ArrayList is used to copy and sort. Can we use tree map here? this is a small and can be ignored. In the beginning of new test case, I think HashMap is better here. The reason is that the number of iterate is usually much higher than the number of unique numbers (the size of the HashMap). By using HashMap we reduce the cost of iterate. In the beginning of new test case, .. appears two times Fixed in HIVE-259.5.patch Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839394#action_12839394 ] He Yongqiang commented on HIVE-259: --- looks good, will test and commit. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838512#action_12838512 ] Jerome Boulon commented on HIVE-259: Can someone explain how can I create/populate a new table to be used by the ant test target? Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838516#action_12838516 ] Carl Steinbach commented on HIVE-259: - @Jerome: take a look at ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838718#action_12838718 ] Zheng Shao commented on HIVE-259: - Hi Jerome, using ArrayListInteger won't cause unnecessary Object creation. We will just create a single ArrayListInteger and use it forever. Does that make sense? Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838735#action_12838735 ] Todd Lipcon commented on HIVE-259: -- Doesn't the autoboxing of Integer types actually allocate objects? I think JVM only flyweights integers for very small ones (iirc only from -127 to 128) Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838118#action_12838118 ] Zheng Shao commented on HIVE-259: - Also see http://wiki.apache.org/hadoop/Hive/HowToContribute#Coding_Convention Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838119#action_12838119 ] Zheng Shao commented on HIVE-259: - The test cases looks a bit too trivial or the results have problems? They always return the same number for the 3 different percentile values. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838173#action_12838173 ] Jerome Boulon commented on HIVE-259: - From my point of view, changing variable access to private in the state object will not make the code more readable ... - I'll change all variables to be lowerCase to match java style, current variable's name are based on Oracle definition. @Zheng - I'm not using an ArrayListInteger but a String to avoid unnecessary object creation (for every single row) ... would even be better if the constructor could have been used but I haven't found how to do that. If we care about 1 extra empty arrayList per mapper/spill in memory then we should care about creating (1 ArrayList + 1 Integer Object per percentile) per row. @Zheng - Regarding the test case that what I add in mind when I asked you, howto create my own table and that exactly the reason why I post Jb2.* files Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837500#action_12837500 ] Carl Steinbach commented on HIVE-259: - Please fix the new Checkstyle errors in UDAFPercentile.java: 35: Missing a Javadoc comment. 39: Missing a Javadoc comment. 39:10: 'public' modifier out of order with the JLS suggestions. 41: Missing a Javadoc comment. 41:12: 'public' modifier out of order with the JLS suggestions. 42:15: Variable 'initDone' must be private and have accessor methods. 43:7: Declaring variables, return values or parameters of type 'HashMap' is not allowed. 43:35: Variable 'counts' must be private and have accessor methods. 44:7: Declaring variables, return values or parameters of type 'ArrayList' is not allowed. 44:26: Variable 'percentiles' must be private and have accessor methods. 47: Missing a Javadoc comment. 47:12: 'public' modifier out of order with the JLS suggestions. 56:11: Variable 'state' must be private and have accessor methods. 82:43: Name '_percentiles' must match pattern '^[a-z][a-zA-Z0-9]*$'. 85:28: Expression can be simplified. 105:39: ')' is preceded with whitespace. 117:26: Expression can be simplified. 125:65: Name 'RN' must match pattern '^[a-z][a-zA-Z0-9]*$'. 129:12: Name 'CRN' must match pattern '^[a-z][a-zA-Z0-9]*$'. 130:12: Name 'FRN' must match pattern '^[a-z][a-zA-Z0-9]*$'. 164:12: Declaring variables, return values or parameters of type 'ArrayList' is not allowed. 173: Line is longer than 100 characters. 184:7: Declaring variables, return values or parameters of type 'ArrayList' is not allowed. 188:12: Name 'N' must match pattern '^[a-z][a-zA-Z0-9]*$'. 189:14: Name 'RN' must match pattern '^[a-z][a-zA-Z0-9]*$'. 191:16: Name 'P' must match pattern '^[a-z][a-zA-Z0-9]*$'. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837522#action_12837522 ] Jerome Boulon commented on HIVE-259: @Carl: How did you get this list? Also, I'm not sure to understand this: Why HashMap and ArrayList are not allowed if supported?? 43:7: Declaring variables, return values or parameters of type 'HashMap' is not allowed. 44:7: Declaring variables, return values or parameters of type 'ArrayList' is not allowed. 164:12: Declaring variables, return values or parameters of type 'ArrayList' is not allowed. 184:7: Declaring variables, return values or parameters of type 'ArrayList' is not allowed. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837526#action_12837526 ] Alex Loddengaard commented on HIVE-259: --- Hey Jerome, I assume it's because you're supposed to use the interface type (e.g. Map or List) for return types, parameter types, and declaring variables. Correct me if I'm wrong, those of you more knowledgeable about Hive's checkstyle :). Alex Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837527#action_12837527 ] Carl Steinbach commented on HIVE-259: - bq. How did you get this list? Run 'ant checkstyle'. The list of violations gets dumped to build/checkstyle/checkstyle-errors.txt. bq. Why HashMap and ArrayList are not allowed if supported? You're allowed to use ArrayList and HashMap, but you're supposed to refer to instances of these classes using the interface (List or Map) instead of the concrete type, e.g. {code:java} MapString, String myMap = new HashMapString, String(); public ListString getStringList() { return new ArrayListString(); } {code} Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834474#action_12834474 ] Zheng Shao commented on HIVE-259: - Is there any limitation on what can be used on the state object or can we use any java Object? We support primitive classes, HashMap (translated into map type in Hive), ArrayList (array type in Hive), and any simple struct-like classes (struct type in Hive). We support arbitrary levels of nesting, but no recursive types. Also how is the state serialized between Map and Reduce? We use SerDe (see SerDe.serialize(...) ) to serialize/deserialize the objects, as well as translations between objects that have the same type (see ObjectInspector and ObjectInspectorConverters). Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259.1.patch, HIVE-259.patch Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832134#action_12832134 ] Zheng Shao commented on HIVE-259: - Jerome, it seems to me that the best data structure for counting is a HashMap, which allows near-constant-time insertion, find, and insertion. When we terminate we can get the entries and sort them but that cost should be small (it's one-time cost and the number of unique items won't be too big - users should have used round to shrink the number of unique numbers). It seems currently we are paying log(n) cost for each find, and O(n) cost for each insertion. Does that make sense? For sharing the state object, we can just declare the state class as public static. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259.patch Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832139#action_12832139 ] Todd Lipcon commented on HIVE-259: -- Agreed re HashMap. Also, there should be some kind of setting that limits how much RAM gets used up. In a later iteration we could do adaptive histogramming once we hit the limit. In this version we should just throw up our hands and fail with a message that says the user needs to discretize harder. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259.patch Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832146#action_12832146 ] Jerome Boulon commented on HIVE-259: Didn't know that we can use an Hash on the state Object ... Is there any limitation on what can be used on the state object or can we use any java Object? Also how is the state serialized between Map and Reduce? Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259.patch Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806134#action_12806134 ] Jerome Boulon commented on HIVE-259: It will also be good to be able to ask for more than one PERCENTILE(column, .99) with only one single structure in memory ex: select PERCENTILE(column, .99), PERCENTILE(column, .50) from myTable; Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806183#action_12806183 ] Carl Steinbach commented on HIVE-259: - @Jerome: Agreed. Allowing sort results to be shared by multiple functions (like in the following example) is key to supporting analytic functions efficiently. {code:sql} SELECT department_id, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary DESC) Median cont, PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY salary DESC) Median disc FROM employees GROUP BY department_id; {code} Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802616#action_12802616 ] Zheng Shao commented on HIVE-259: - This is a good first step. We can provide some UDFs to bucketize the values first in case the user needs it. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781824#action_12781824 ] Carl Steinbach commented on HIVE-259: - This would be a very useful function to have. For the sake of completeness (and without much additional effort) it would be nice to provide both PERCENTILE_DISC and PERCENTILE_CONT. PERCENTILE_CONT: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions110.htm PERCENTILE_DISC: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions111.htm Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782120#action_12782120 ] Todd Lipcon commented on HIVE-259: -- An easy way to do this that would work for a ton of data sets would to be essentially do counting sort. If you have only a few thousand distinct values in the column to be analyzed, just make a hashtable, count up how many you see, and then in the single reducer use the histogram to figure out the percentile. This should work great for datasets like age, and even for sets like number of days since user signed up. For sets that are truly continuous, would be useful when combined with a binning UDF to discretize it. Sadly it's not general case, but would be an easy first step. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668699#action_12668699 ] Edward Capriolo commented on HIVE-259: -- 95% percentile is very often used in Internet Service Provider billing that might be useful. The percentile calculation is a sort and then picking an element. The syntax could be like: * PERCENTILE(column, .99) * PERCENTILE(column, .50) In this manner you could do any percentile. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.