[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781824#action_12781824 ] Carl Steinbach commented on HIVE-259: - This would be a very useful function to have. For the sake of completeness (and without much additional effort) it would be nice to provide both PERCENTILE_DISC and PERCENTILE_CONT. PERCENTILE_CONT: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions110.htm PERCENTILE_DISC: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions111.htm Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.18 #285
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/285/changes Changes: [heyongqiang] custom mappers/reducers should not be initialized at compile time -- [...truncated 10866 lines...] [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_function2.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_function2.q.out [junit] Done query: unknown_function2.q [junit] Begin query: unknown_function3.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_function3.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_function3.q.out [junit] Done query: unknown_function3.q [junit] Begin query: unknown_function4.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_function4.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_function4.q.out [junit] Done query: unknown_function4.q [junit] Begin query: unknown_table1.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit]
Build failed in Hudson: Hive-trunk-h0.20 #109
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/109/changes Changes: [heyongqiang] custom mappers/reducers should not be initialized at compile time -- [...truncated 10878 lines...] [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_function2.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_function2.q.out [junit] Done query: unknown_function2.q [junit] Begin query: unknown_function3.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_function3.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_function3.q.out [junit] Done query: unknown_function3.q [junit] Begin query: unknown_function4.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_function4.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_function4.q.out [junit] Done query: unknown_function4.q [junit] Begin query: unknown_table1.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit]
[jira] Commented: (HIVE-928) ScriptOperator does not set CLASSPATH of spawned process.
[ https://issues.apache.org/jira/browse/HIVE-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782084#action_12782084 ] Carl Steinbach commented on HIVE-928: - Yes, I'll post an updated patch by the end of the day. Sorry for the wait. ScriptOperator does not set CLASSPATH of spawned process. - Key: HIVE-928 URL: https://issues.apache.org/jira/browse/HIVE-928 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: HIVE-928, HIVE-928.patch ScriptOperator does not set the CLASSPATH of the the spawned script process. The practical implication of this is that Java JARs that are added using the add JAR command will not be accessible to TRANSFORM/MAP/REDUCE operators unless the user can guess the location of the JAR archive on each node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782120#action_12782120 ] Todd Lipcon commented on HIVE-259: -- An easy way to do this that would work for a ton of data sets would to be essentially do counting sort. If you have only a few thousand distinct values in the column to be analyzed, just make a hashtable, count up how many you see, and then in the single reducer use the histogram to figure out the percentile. This should work great for datasets like age, and even for sets like number of days since user signed up. For sets that are truly continuous, would be useful when combined with a binning UDF to discretize it. Sadly it's not general case, but would be an easy first step. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-947) Add run length encoding into RCFile's block header
[ https://issues.apache.org/jira/browse/HIVE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-947: -- Attachment: hive-947-2009-11-24.patch Add run length encoding into RCFile's block header --- Key: HIVE-947 URL: https://issues.apache.org/jira/browse/HIVE-947 Project: Hadoop Hive Issue Type: Improvement Reporter: He Yongqiang Assignee: He Yongqiang Priority: Minor Attachments: hive-947-2009-11-22.patch, hive-947-2009-11-24.patch When RCFile constructing rows, it needs to get column value's length via calling readVLong(). And this should be avoided for fix length or most fix length columns. This also should not influence old rcfile files, which means it should also work correctly on files with previous RCFile format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-953) script_broken_pipe3.q broken
script_broken_pipe3.q broken Key: HIVE-953 URL: https://issues.apache.org/jira/browse/HIVE-953 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0 Reporter: Namit Jain Assignee: Paul Yang The negative test script_broken_pipe3.q is broken if we allow partial consumption. For now, I have disabled partial consumption. Can you take a look ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.