[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2009-11-24 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781824#action_12781824
 ] 

Carl Steinbach commented on HIVE-259:
-

This would be a very useful function to have.

For the sake of completeness (and without much additional effort) it would be 
nice to provide both PERCENTILE_DISC and PERCENTILE_CONT.

PERCENTILE_CONT: 
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions110.htm
PERCENTILE_DISC:  
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions111.htm


 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer

 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.18 #285

2009-11-24 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/285/changes

Changes:

[heyongqiang] custom mappers/reducers should not be initialized at compile time

--
[...truncated 10866 lines...]
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_function2.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_function2.q.out
[junit] Done query: unknown_function2.q
[junit] Begin query: unknown_function3.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_function3.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_function3.q.out
[junit] Done query: unknown_function3.q
[junit] Begin query: unknown_function4.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_function4.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_function4.q.out
[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] 

Build failed in Hudson: Hive-trunk-h0.20 #109

2009-11-24 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/109/changes

Changes:

[heyongqiang] custom mappers/reducers should not be initialized at compile time

--
[...truncated 10878 lines...]
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_function2.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_function2.q.out
[junit] Done query: unknown_function2.q
[junit] Begin query: unknown_function3.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_function3.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_function3.q.out
[junit] Done query: unknown_function3.q
[junit] Begin query: unknown_function4.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12
[junit] OK
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] POSTHOOK: Output: defa...@srcbucket
[junit] OK
[junit] Loading data to table src
[junit] POSTHOOK: Output: defa...@src
[junit] OK
[junit] Loading data to table src1
[junit] POSTHOOK: Output: defa...@src1
[junit] OK
[junit] Loading data to table src_sequencefile
[junit] POSTHOOK: Output: defa...@src_sequencefile
[junit] OK
[junit] Loading data to table src_thrift
[junit] POSTHOOK: Output: defa...@src_thrift
[junit] OK
[junit] Loading data to table src_json
[junit] POSTHOOK: Output: defa...@src_json
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ql/test/logs/negative/unknown_function4.q.out
 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ql/src/test/results/compiler/errors/unknown_function4.q.out
[junit] Done query: unknown_function4.q
[junit] Begin query: unknown_table1.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] 

[jira] Commented: (HIVE-928) ScriptOperator does not set CLASSPATH of spawned process.

2009-11-24 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782084#action_12782084
 ] 

Carl Steinbach commented on HIVE-928:
-

Yes, I'll post an updated patch by the end of the day. Sorry for the wait.

 ScriptOperator does not set CLASSPATH of spawned process.
 -

 Key: HIVE-928
 URL: https://issues.apache.org/jira/browse/HIVE-928
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Attachments: HIVE-928, HIVE-928.patch


 ScriptOperator does not set the CLASSPATH of the the spawned script process. 
 The practical implication of this is that Java JARs that are added using the 
 add JAR command 
 will not be accessible to TRANSFORM/MAP/REDUCE operators unless the user can 
 guess the
 location of the JAR archive on each node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2009-11-24 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782120#action_12782120
 ] 

Todd Lipcon commented on HIVE-259:
--

An easy way to do this that would work for a ton of data sets would to be 
essentially do counting sort. If you have only a few thousand distinct values 
in the column to be analyzed, just make a hashtable, count up how many you see, 
and then in the single reducer use the histogram to figure out the percentile. 
This should work great for datasets like age, and even for sets like number of 
days since user signed up. For sets that are truly continuous, would be useful 
when combined with a binning UDF to discretize it.

Sadly it's not general case, but would be an easy first step.

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer

 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-947) Add run length encoding into RCFile's block header

2009-11-24 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-947:
--

Attachment: hive-947-2009-11-24.patch

 Add run length encoding into RCFile's block header 
 ---

 Key: HIVE-947
 URL: https://issues.apache.org/jira/browse/HIVE-947
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
Priority: Minor
 Attachments: hive-947-2009-11-22.patch, hive-947-2009-11-24.patch


 When RCFile constructing rows, it needs to get column value's length via 
 calling readVLong(). And this should be avoided for fix length or most fix 
 length columns. 
 This also should not influence old rcfile files, which means it should also 
 work correctly on files with previous RCFile format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-953) script_broken_pipe3.q broken

2009-11-24 Thread Namit Jain (JIRA)
script_broken_pipe3.q broken


 Key: HIVE-953
 URL: https://issues.apache.org/jira/browse/HIVE-953
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Namit Jain
Assignee: Paul Yang


The negative test script_broken_pipe3.q is broken if we allow partial 
consumption.
For now, I have disabled partial consumption. Can you take a look ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.