Build failed in Hudson: Hive-trunk-h0.17 #465
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/465/changes Changes: [athusoo] HIVE-1373. Missing connection pool plugin in Eclipse classpath. (Vinithra via athusoo) -- [...truncated 11410 lines...] [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_function4.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_function4.q.out [junit] Done query: unknown_function4.q [junit] Begin query: unknown_table1.q [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out [junit] Done query: unknown_table1.q [junit] Begin query: unknown_table2.q [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit]
Hudson build is back to normal : Hive-trunk-h0.18 #468
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/468/changes
[jira] Updated: (HIVE-895) Add SerDe for Avro serialized data
[ https://issues.apache.org/jira/browse/HIVE-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-895: Assignee: (was: Carl Steinbach) Add SerDe for Avro serialized data -- Key: HIVE-895 URL: https://issues.apache.org/jira/browse/HIVE-895 Project: Hadoop Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Jeff Hammerbacher As Avro continues to mature, having a SerDe to allow HiveQL queries over Avro data seems like a solid win. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1170) Ivy looks for Hadoop POMs that don't exist
[ https://issues.apache.org/jira/browse/HIVE-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach resolved HIVE-1170. -- Resolution: Not A Problem This is not actually a bug as far as I can tell. Resolving as Not A Problem. Ivy looks for Hadoop POMs that don't exist -- Key: HIVE-1170 URL: https://issues.apache.org/jira/browse/HIVE-1170 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.6.0 Reporter: Carl Steinbach Assignee: Carl Steinbach In the event that Ivy can not satisfy the shim dependencies using archive.apache.org our ivysetttings configuration causes it to look for Hadoop POMs. This will always fail since Hadoop POMs do not exist (see HADOOP-6382). {noformat} ivy-retrieve-hadoop-source: [ivy:retrieve] :: Ivy 2.0.0-rc2 - 20081028224207 :: http://ant.apache.org/ivy/ : :: loading settings :: file = /master/hive/ivy/ivysettings.xml [ivy:retrieve] :: resolving dependencies :: org.apache.hadoop.hive#shims;working [ivy:retrieve] confs: [default] [ivy:retrieve] :: resolution report :: resolve 953885ms :: artifacts dl 0ms - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 1 | 0 | 0 | 0 || 0 | 0 | - [ivy:retrieve] [ivy:retrieve] :: problems summary :: [ivy:retrieve] WARNINGS [ivy:retrieve] module not found: hadoop#core;0.20.1 [ivy:retrieve] hadoop-source: tried [ivy:retrieve]-- artifact hadoop#core;0.20.1!hadoop.tar.gz(source): [ivy:retrieve] http://archive.apache.org/dist/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz [ivy:retrieve] apache-snapshot: tried [ivy:retrieve] https://repository.apache.org/content/repositories/snapshots/hadoop/core/0.20.1/core-0.20.1.pom [ivy:retrieve]-- artifact hadoop#core;0.20.1!hadoop.tar.gz(source): [ivy:retrieve] https://repository.apache.org/content/repositories/snapshots/hadoop/core/0.20.1/hadoop-0.20.1.tar.gz [ivy:retrieve] maven2: tried [ivy:retrieve] http://repo1.maven.org/maven2/hadoop/core/0.20.1/core-0.20.1.pom [ivy:retrieve]-- artifact hadoop#core;0.20.1!hadoop.tar.gz(source): [ivy:retrieve] http://repo1.maven.org/maven2/hadoop/core/0.20.1/core-0.20.1.tar.gz [ivy:retrieve] :: [ivy:retrieve] :: UNRESOLVED DEPENDENCIES :: [ivy:retrieve] :: [ivy:retrieve] :: hadoop#core;0.20.1: not found [ivy:retrieve] :: [ivy:retrieve] ERRORS [ivy:retrieve] Server access Error: Connection timed out url=http://archive.apache.org/dist/hadoop/core/hadoop-0.20.1/hadoop-0.20.1.tar.gz [ivy:retrieve] Server access Error: Connection timed out url=https://repository.apache.org/content/repositories/snapshots/hadoop/core/0.20.1/core-0.20.1.pom [ivy:retrieve] Server access Error: Connection timed out url=https://repository.apache.org/content/repositories/snapshots/hadoop/core/0.20.1/hadoop-0.20.1.tar.gz [ivy:retrieve] Server access Error: Connection timed out url=http://repo1.maven.org/maven2/hadoop/core/0.20.1/core-0.20.1.pom [ivy:retrieve] Server access Error: Connection timed out url=http://repo1.maven.org/maven2/hadoop/core/0.20.1/core-0.20.1.tar.gz [ivy:retrieve] [ivy:retrieve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-1187) Implement ddldump utility for Hive Metastore
[ https://issues.apache.org/jira/browse/HIVE-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach resolved HIVE-1187. -- Resolution: Duplicate Implement ddldump utility for Hive Metastore Key: HIVE-1187 URL: https://issues.apache.org/jira/browse/HIVE-1187 Project: Hadoop Hive Issue Type: New Feature Components: Metastore Affects Versions: 0.6.0 Reporter: Carl Steinbach Assignee: Carl Steinbach Implement a ddldump utility for the Hive metastore that will generate the QL DDL necessary to recreate the state of the current metastore on another metastore instance. A major use case for this utility is migrating a metastore from one database to another database, e.g. from an embedded Derby instanced to a MySQL instance. The ddldump utility should support the following features: * Ability to generate DDL for specific tables or all tables. * Ability to specify a table name prefix for the generated DDL, which will be useful for resolving table name conflicts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-967) Implement show create table
[ https://issues.apache.org/jira/browse/HIVE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach reassigned HIVE-967: --- Assignee: Carl Steinbach Implement show create table - Key: HIVE-967 URL: https://issues.apache.org/jira/browse/HIVE-967 Project: Hadoop Hive Issue Type: New Feature Reporter: Adam Kramer Assignee: Carl Steinbach SHOW CREATE TABLE would be very useful in cases where you are trying to figure out the partitioning and/or bucketing scheme for a table. Perhaps this could be implemented by having new tables automatically SET PROPERTIES (create_command='raw text of the create statement')? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.19 #467
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/467/changes Changes: [athusoo] HIVE-1373. Missing connection pool plugin in Eclipse classpath. (Vinithra via athusoo) -- [...truncated 14090 lines...] [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_function4.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_function4.q.out [junit] Done query: unknown_function4.q [junit] Begin query: unknown_table1.q [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table src [junit] POSTHOOK: Output: defa...@src [junit] OK [junit] Loading data to table src1 [junit] POSTHOOK: Output: defa...@src1 [junit] OK [junit] Loading data to table src_sequencefile [junit] POSTHOOK: Output: defa...@src_sequencefile [junit] OK [junit] Loading data to table src_thrift [junit] POSTHOOK: Output: defa...@src_thrift [junit] OK [junit] Loading data to table src_json [junit] POSTHOOK: Output: defa...@src_json [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_table1.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_table1.q.out [junit] Done query: unknown_table1.q [junit] Begin query: unknown_table2.q [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-08, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-08/hr=12 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=11) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=11 [junit] OK [junit] Loading data to table srcpart partition (ds=2008-04-09, hr=12) [junit] POSTHOOK: Output: defa...@srcpart@ds=2008-04-09/hr=12 [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] POSTHOOK: Output: defa...@srcbucket [junit] OK [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit] POSTHOOK: Output: defa...@srcbucket2 [junit] OK [junit] Loading data to table srcbucket2 [junit]
[jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys
[ https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877500#action_12877500 ] Soundararajan Velu commented on HIVE-1139: -- Ning, Aravind, I got this implemented and it looks good so far, I will try uploading the version I have modified after a thorough test, All I did was copy the HashMap implementation into the HashMapWrapper (leaving the existing functionality intact), now HashmapWrapper works exactly like hashmap, but I did not get to test out the serialization issues. will do that and update you guys. I think this should help us in our OOM issue around GroupBy... GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys Key: HIVE-1139 URL: https://issues.apache.org/jira/browse/HIVE-1139 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Arvind Prabhakar When a partial aggregation performed on a mapper, a HashMap is created to keep all distinct keys in main memory. This could leads to OOM exception when there are too many distinct keys for a particular mapper. A workaround is to set the map split size smaller so that each mapper takes less number of rows. A better solution is to use the persistent HashMapWrapper (currently used in CommonJoinOperator) to spill overflow rows to disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys
[ https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877502#action_12877502 ] Soundararajan Velu commented on HIVE-1139: -- to add, XMLEncoder/XMLDecoder works just fine and can handle our serde issues. GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys Key: HIVE-1139 URL: https://issues.apache.org/jira/browse/HIVE-1139 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Arvind Prabhakar When a partial aggregation performed on a mapper, a HashMap is created to keep all distinct keys in main memory. This could leads to OOM exception when there are too many distinct keys for a particular mapper. A workaround is to set the map split size smaller so that each mapper takes less number of rows. A better solution is to use the persistent HashMapWrapper (currently used in CommonJoinOperator) to spill overflow rows to disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1399) Nested UDAFs cause Hive Internal Error (NullPointerException)
Nested UDAFs cause Hive Internal Error (NullPointerException) - Key: HIVE-1399 URL: https://issues.apache.org/jira/browse/HIVE-1399 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Mayank Lahiri Fix For: 0.6.0 This query does not make real-world sense, and I'm guessing it's not even supported by HQL/SQL, but I'm pretty sure that it shouldn't be causing an internal error with a NullPointerException. normal just has one column called val. I'm running on trunk, svn updated 5 minutes ago, ant clean package. SELECT percentile(val, percentile(val, 0.5)) FROM normal; FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:153) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:587) at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:708) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:128) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:6241) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2301) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:2860) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5002) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5524) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6055) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:304) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:377) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) I've also recreated this error with a GenericUDAF I'm writing, and also with the following: SELECT percentile(val, percentile()) FROM normal; SELECT avg(variance(dob_year)) FROM somedata; // this makes no sense, but still a NullPointerException -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-705) Hive HBase Integration (umbrella)
[ https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-705: Summary: Hive HBase Integration (umbrella) (was: Let Hive can analyse hbase's tables) Hive HBase Integration (umbrella) - Key: HIVE-705 URL: https://issues.apache.org/jira/browse/HIVE-705 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Samuel Guo Assignee: John Sichi Fix For: 0.6.0 Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, HIVE-705.3.patch, HIVE-705.4.patch, HIVE-705.5.patch, HIVE-705.6.patch, HIVE-705.7.patch, HIVE-705_draft.patch, HIVE-705_revision806905.patch, HIVE-705_revision883033.patch, zookeeper-3.2.2.jar Add a serde over the hbase's tables, so that hive can analyse the data stored in hbase easily. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1226) support filter pushdown against non-native tables
[ https://issues.apache.org/jira/browse/HIVE-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1226: - Component/s: HBase Handler support filter pushdown against non-native tables - Key: HIVE-1226 URL: https://issues.apache.org/jira/browse/HIVE-1226 Project: Hadoop Hive Issue Type: Improvement Components: HBase Handler, Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.6.0 For example, HBase's scan object can take filters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-705) Hive HBase Integration (umbrella)
[ https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-705: Component/s: HBase Handler Hive HBase Integration (umbrella) - Key: HIVE-705 URL: https://issues.apache.org/jira/browse/HIVE-705 Project: Hadoop Hive Issue Type: New Feature Components: HBase Handler Affects Versions: 0.6.0 Reporter: Samuel Guo Assignee: John Sichi Fix For: 0.6.0 Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, hbase-0.20.3-test.jar, hbase-0.20.3.jar, HIVE-705.1.patch, HIVE-705.2.patch, HIVE-705.3.patch, HIVE-705.4.patch, HIVE-705.5.patch, HIVE-705.6.patch, HIVE-705.7.patch, HIVE-705_draft.patch, HIVE-705_revision806905.patch, HIVE-705_revision883033.patch, zookeeper-3.2.2.jar Add a serde over the hbase's tables, so that hive can analyse the data stored in hbase easily. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1267) Make CombineHiveInputFormat work with non-native tables
[ https://issues.apache.org/jira/browse/HIVE-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1267: - Component/s: HBase Handler Make CombineHiveInputFormat work with non-native tables --- Key: HIVE-1267 URL: https://issues.apache.org/jira/browse/HIVE-1267 Project: Hadoop Hive Issue Type: Bug Components: HBase Handler, Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.6.0 As part of fixing HIVE-1257, I am making CombineHiveInputFormat punt when it sees a non-native table. I need to come up with a real fix to allow CombineHiveInputFormat to deal with native and non-native tables at the same time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-758) function to load data from hive to hbase
[ https://issues.apache.org/jira/browse/HIVE-758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-758: Component/s: UDF function to load data from hive to hbase Key: HIVE-758 URL: https://issues.apache.org/jira/browse/HIVE-758 Project: Hadoop Hive Issue Type: New Feature Components: HBase Handler, UDF Reporter: Raghotham Murthy Priority: Minor Attachments: hive-758.1.patch, hive-758.2.patch supoprt a query like: SELECT hbase_put('hive_hbase_table', rowid, colfamily, col, value, ts) FROM src; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1133) Refactor InputFormat and OutputFormat for Hive
[ https://issues.apache.org/jira/browse/HIVE-1133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1133: - Component/s: HBase Handler Serializers/Deserializers Refactor InputFormat and OutputFormat for Hive -- Key: HIVE-1133 URL: https://issues.apache.org/jira/browse/HIVE-1133 Project: Hadoop Hive Issue Type: Improvement Components: HBase Handler, Serializers/Deserializers Affects Versions: 0.6.0 Reporter: Zheng Shao Currently we ran into several problems of the FileInputFormat/OutputFormat in Hive. The requirements are: R1. We want to support HBase: HIVE-806 R2. We want to selectively include files based on file names: HIVE-951 R3. We want to optionally choose to recurse on the directory structure: HIVE-1083 R4. We want to pass the filter condition into the storage (very useful for HBase, and indexed data format) R5. We want to pass the column selection information into the storage (already done as part of the RCFile, but we can do it better) We need to structure these requirements and the code structure in a good way to make it extensible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1222) in metastore, do not store names of inputformat/outputformat/serde for non-native tables
[ https://issues.apache.org/jira/browse/HIVE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1222: - Component/s: HBase Handler Metastore in metastore, do not store names of inputformat/outputformat/serde for non-native tables Key: HIVE-1222 URL: https://issues.apache.org/jira/browse/HIVE-1222 Project: Hadoop Hive Issue Type: Improvement Components: HBase Handler, Metastore, Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.6.0 Instead, store null and get them dynamically from the storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1221) model storage handler as an attribute on StorageDescriptor
[ https://issues.apache.org/jira/browse/HIVE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1221: - Component/s: HBase Handler model storage handler as an attribute on StorageDescriptor -- Key: HIVE-1221 URL: https://issues.apache.org/jira/browse/HIVE-1221 Project: Hadoop Hive Issue Type: Improvement Components: HBase Handler, Metastore Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.6.0 For initial work on HIVE-705, I modeled storage handler as a table property, but it should really be a first-class attribute on StorageDescriptor. We'd like to combine this metastore change with others such as HIVE-1073. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1223) support partitioning for non-native tables
[ https://issues.apache.org/jira/browse/HIVE-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1223: - Component/s: HBase Handler support partitioning for non-native tables -- Key: HIVE-1223 URL: https://issues.apache.org/jira/browse/HIVE-1223 Project: Hadoop Hive Issue Type: Improvement Components: HBase Handler, Metastore, Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.6.0 The exact requirements remain to be determined here, since there are a lot of possibilities for what this could mean. Using HBase as an example, one possibility would be physical partitions such as creating one HBase table per partition, whereas another would be virtual partitions such as one partition per timestamp (e.g. to provide snapshot semantics). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1224) refine interaction between views / non-native tables and execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1224: - Component/s: HBase Handler refine interaction between views / non-native tables and execution hooks Key: HIVE-1224 URL: https://issues.apache.org/jira/browse/HIVE-1224 Project: Hadoop Hive Issue Type: Improvement Components: HBase Handler, Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.6.0 I need to take a look to see what information is being passed to pre/post exec hooks for operations on views and non-native tables, and see if it is correct and sufficient for all conceivable use cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1225) enhance storage handler interface to allow for atomic operations
[ https://issues.apache.org/jira/browse/HIVE-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1225: - Component/s: HBase Handler enhance storage handler interface to allow for atomic operations Key: HIVE-1225 URL: https://issues.apache.org/jira/browse/HIVE-1225 Project: Hadoop Hive Issue Type: Improvement Components: HBase Handler, Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.6.0 For native tables, we support atomic operations such as INSERT by only moving files from tmp to the real location once the operation is complete. Some storage handlers may be able to support something equivalent; e.g. for HBase, we could purge new timestamps if the operation fails. Even if we don't go all the way to two-phase-commit, we could at least enable something that handles most simple cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1397) histogram() UDAF for a numerical column
[ https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Lahiri updated HIVE-1397: Status: Patch Available (was: Open) I've implemented and tested the algorithm. I'm running some experiments on how far from optimal (in terms of MSE) we're getting with this streaming algorithm, but as of now, it seems to perform well when the number of data points is a few orders of magnitude larger than the number of bins. As an example I'm getting good histograms when there 100,000 data points and 20-80 histogram bins. As I noted before, there are no approximation guarantees in terms of how close to optimal the histogram is. histogram() UDAF for a numerical column --- Key: HIVE-1397 URL: https://issues.apache.org/jira/browse/HIVE-1397 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: Mayank Lahiri Assignee: Mayank Lahiri Fix For: 0.6.0 Attachments: HIVE-1397.1.patch A histogram() UDAF to generate an approximate histogram of a numerical (byte, short, double, long, etc.) column. The result is returned as a map of (x,y) histogram pairs, and can be plotted in Gnuplot using impulses (for example). The algorithm is currently adapted from A streaming parallel decision tree algorithm by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space proportional to the number of histogram bins specified. It has no approximation guarantees, but seems to work well when there is a lot of data and a large number (e.g. 50-100) of histogram bins specified. A typical call might be: SELECT histogram(val, 10) FROM some_table; where the result would be a histogram with 10 bins, returned as a Hive map object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1227) factor TableSinkOperator out of existing FileSinkOperator
[ https://issues.apache.org/jira/browse/HIVE-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1227: - Component/s: HBase Handler factor TableSinkOperator out of existing FileSinkOperator - Key: HIVE-1227 URL: https://issues.apache.org/jira/browse/HIVE-1227 Project: Hadoop Hive Issue Type: Improvement Components: HBase Handler, Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.6.0 For non-native tables, a lot of the code in FileSinkOperator is irrelevant and has to be bypassed. It would be cleaner to factor out an AbstractSinkOperator with subclasses FileSinkOperator and TableSinkOperator. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1220) accept TBLPROPERTIES on CREATE TABLE/VIEW
[ https://issues.apache.org/jira/browse/HIVE-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1220: - Component/s: HBase Handler accept TBLPROPERTIES on CREATE TABLE/VIEW - Key: HIVE-1220 URL: https://issues.apache.org/jira/browse/HIVE-1220 Project: Hadoop Hive Issue Type: Improvement Components: HBase Handler, Query Processor Affects Versions: 0.5.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.6.0 Attachments: HIVE-1220.1.patch Currently, Hive only supports ALTER TABLE t SET TBLPROPERTIES, but does not allow specification of table properties during CREATE TABLE. We should allow properties to be set at the time a table or view is created. This is useful in general, and in particular we want to use this so that storage handler properties (see HIVE-705) unrelated to serdes can be specified here rather than in SERDEPROPERTIES. See also HIVE-1144 regarding views. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1240) support ALTER TABLE on non-native tables
[ https://issues.apache.org/jira/browse/HIVE-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1240: - Component/s: HBase Handler support ALTER TABLE on non-native tables Key: HIVE-1240 URL: https://issues.apache.org/jira/browse/HIVE-1240 Project: Hadoop Hive Issue Type: Improvement Components: HBase Handler, Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.6.0 Currently this is prohibited, but at least some cases make sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1400) CombineHiveInputFormat does not set columns needed property
CombineHiveInputFormat does not set columns needed property - Key: HIVE-1400 URL: https://issues.apache.org/jira/browse/HIVE-1400 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang When i was testing a job, i found it seems CombineHiveInputFormat did not pass columns needed to the underlying reader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1397) histogram() UDAF for a numerical column
[ https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Lahiri updated HIVE-1397: Attachment: HIVE-1397.1.patch histogram() UDAF for a numerical column --- Key: HIVE-1397 URL: https://issues.apache.org/jira/browse/HIVE-1397 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: Mayank Lahiri Assignee: Mayank Lahiri Fix For: 0.6.0 Attachments: HIVE-1397.1.patch A histogram() UDAF to generate an approximate histogram of a numerical (byte, short, double, long, etc.) column. The result is returned as a map of (x,y) histogram pairs, and can be plotted in Gnuplot using impulses (for example). The algorithm is currently adapted from A streaming parallel decision tree algorithm by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space proportional to the number of histogram bins specified. It has no approximation guarantees, but seems to work well when there is a lot of data and a large number (e.g. 50-100) of histogram bins specified. A typical call might be: SELECT histogram(val, 10) FROM some_table; where the result would be a histogram with 10 bins, returned as a Hive map object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1401) Web Interface can ony browse default
Web Interface can ony browse default Key: HIVE-1401 URL: https://issues.apache.org/jira/browse/HIVE-1401 Project: Hadoop Hive Issue Type: New Feature Components: Web UI Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1401) Web Interface can ony browse default
[ https://issues.apache.org/jira/browse/HIVE-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1401: -- Attachment: HIVE-1401-1-patch.txt Web Interface can ony browse default Key: HIVE-1401 URL: https://issues.apache.org/jira/browse/HIVE-1401 Project: Hadoop Hive Issue Type: New Feature Components: Web UI Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: HIVE-1401-1-patch.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hive-Hbase integration problem, ask for help
Hi Guys, I download the hive source from SVN server, build it and try to run the hive-hbase integration. I works well on all file-based hive tables, but on the hbase-based tables, the 'insert' command cann't run successful. The 'select' command can run well. error info is below: hive INSERT OVERWRITE TABLE hive_zsf SELECT * FROM zsf WHERE id=3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201006081948_0021, Tracking URL = http://linux-01:50030/jobdetails.jsp?jobid=job_201006081948_0021 Kill Command = /opt/hadoop/hdfs/bin/../bin/hadoop job -Dmapred.job.tracker=linux-01:9001 -kill job_201006081948_0021 2010-06-09 16:05:43,898 Stage-0 map = 0%, reduce = 0% 2010-06-09 16:06:12,131 Stage-0 map = 100%, reduce = 100% Ended Job = job_201006081948_0021 with errors Task with the most failures(4): - Task ID: task_201006081948_0021_m_00 URL: http://linux-01:50030/taskdetails.jsp?jobid=job_201006081948_0021 http://linux-01:50030/taskdetails.jsp?jobid=job_201006081948_0021tipid=tas k_201006081948_0021_m_00 tipid=task_201006081948_0021_m_00 - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver I create a hbase-based table with hive, put some data into the hbase table through the hbase shell, and can select data from it through hive: CREATE TABLE hive_zsf1(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val) TBLPROPERTIES (hbase.table.name = hive_zsf1); hbase(main):001:0 scan 'hive_zsf1' ROW COLUMN+CELL 1 column=cf1:val, timestamp=1276157509028, value=zsf 2 column=cf1:val, timestamp=1276157539051, value=zzf 3 column=cf1:val, timestamp=1276157548247, value=zw 4 column=cf1:val, timestamp=1276157557115, value=cjl 4 row(s) in 0.0470 seconds hbase(main):002:0 hive select * from hive_zsf1 where id=3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201006081948_0038, Tracking URL = http://linux-01:50030/jobdetails.jsp?jobid=job_201006081948_0038 Kill Command = /opt/hadoop/hdfs/bin/../bin/hadoop job -Dmapred.job.tracker=linux-01:9001 -kill job_201006081948_0038 2010-06-11 10:25:42,049 Stage-1 map = 0%, reduce = 0% 2010-06-11 10:25:45,090 Stage-1 map = 100%, reduce = 0% 2010-06-11 10:25:48,133 Stage-1 map = 100%, reduce = 100% Ended Job = job_201006081948_0038 OK 3 zw Time taken: 13.526 seconds hive - This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
[jira] Updated: (HIVE-543) provide option to run hive in local mode
[ https://issues.apache.org/jira/browse/HIVE-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-543: --- Attachment: hive-543.patch.1 a few fixes for better local mode execution: - provide alternate log4j file configuration for capturing local mode execution log (and default to hive log4j if none provided). this cleans up the goop on the cli but allows captuing execution time logs in a separate location if desired - bypass distributed cache for local mode submissions. saves on hdfs time - some cleanup on the set/get MapRedWork code path. it seems to have been messed up after the parallel execution changes - getMRScratchDir - now returns a local scratch dir when executing in local mode. so we don't hit hdfs unnecessarily in local mode. - fix to fileutils.makequalified because of the above. there was a subtle bug in this that was causing file paths to get messed up when using local paths for interemediate data - bypassed query plan serialization/deserialization except for test mode. from past experience - xml serialization/deserialization is pretty expensive and makes no sense to subject every query to it. provide option to run hive in local mode Key: HIVE-543 URL: https://issues.apache.org/jira/browse/HIVE-543 Project: Hadoop Hive Issue Type: Improvement Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Attachments: hive-543.patch.1 this is a little bit more than just mapred.job.tracker=local when run in this mode - multiple jobs are an issue since writing to same tmp directories is an issue. the following options: hadoop.tmp.dir mapred.local.dir need to be randomized (perhaps based on queryid). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-543) provide option to run hive in local mode
[ https://issues.apache.org/jira/browse/HIVE-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joydeep Sen Sarma updated HIVE-543: --- Status: Patch Available (was: Open) - mapred.job.tracker=local continues to be the way to setup hive local mode. - admins can provide appropriate mapred.local.dir and mapred.system.dir settings for hive clients (for local mode execution). they can do this either via hive configuration files or via hadoop client-side-only configuration files. for regular cluster jobs - these are controlled by hadoop server side configuration files. some of the cleanups regarding randomizing local/system directories etc. for concurrent queries were already in place (via hive-77). provide option to run hive in local mode Key: HIVE-543 URL: https://issues.apache.org/jira/browse/HIVE-543 Project: Hadoop Hive Issue Type: Improvement Reporter: Joydeep Sen Sarma Assignee: Joydeep Sen Sarma Attachments: hive-543.patch.1 this is a little bit more than just mapred.job.tracker=local when run in this mode - multiple jobs are an issue since writing to same tmp directories is an issue. the following options: hadoop.tmp.dir mapred.local.dir need to be randomized (perhaps based on queryid). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.