Hive jdbc connector
Hi sir, Which Java version does hive jdbc connector supports? Thanks, Vijay S.
[jira] [Created] (HIVE-13817) Allow DNS CNAME ALIAS Resolution from apache hive beeline JDBC URL to allow for failover
Vijay Singh created HIVE-13817: -- Summary: Allow DNS CNAME ALIAS Resolution from apache hive beeline JDBC URL to allow for failover Key: HIVE-13817 URL: https://issues.apache.org/jira/browse/HIVE-13817 Project: Hive Issue Type: New Feature Components: Beeline Affects Versions: 1.2.1 Reporter: Vijay Singh Currently, in case of BDR clusters, DNS CNAME alias based connections fail. As _HOST resolves to exact endpoint specified in connection string and that may not be intended SPN for kerberos based on reverse DNS lookup. Consequently this JIRA proposes that client specific setting be used to resolv _HOST from CNAME DNS alias to A record entry on the fly in beeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Fwd: problem in beeline query execution
$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:183) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineJerseyRetryFilter.handle(TimelineClientImpl.java:245) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingObject(TimelineClientImpl.java:474) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:323) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$1.run(TimelineClientImpl.java:320) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPosting(TimelineClientImpl.java:320) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:305) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.handleEvents(ATSHistoryLoggingService.java:346) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService.access$700(ATSHistoryLoggingService.java:53) at org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService$1.run(ATSHistoryLoggingService.java:190) at java.lang.Thread.run(Thread.java:745) 2016-04-27 13:02:23,212 [INFO] [HistoryEventHandlingThread] |util.ExitUtil|: Exiting with status -1 2016-04-27 13:02:23,214 [INFO] [Thread-3] |app.DAGAppMaster|: DAGAppMasterShutdownHook invoked 2016-04-27 13:02:23,214 [INFO] [Thread-3] |app.DAGAppMaster|: DAGAppMaster received a signal. Signaling TaskScheduler 2016-04-27 13:02:23,214 [INFO] [Thread-3] |rm.TaskSchedulerEventHandler|: TaskScheduler notified that iSignalled was : true 2016-04-27 13:02:23,376 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |webapp.WebApps|: Registered webapp guice modules We are using the HDP 2.3 Please help me in it. -- Thanks, Vijay
Re: [ANNOUNCE] New Hive Committer - Wei Zheng
Congrats Wei Zheng!! On Mar 10, 2016 6:57 AM, "Vikram Dixit K" wrote: > The Apache Hive PMC has voted to make Wei Zheng a committer on the Apache > Hive Project. Please join me in congratulating Wei. > > Thanks > Vikram. >
[jira] [Created] (HIVE-11365) Enable Multifactor authentication in HiveServer2 for LDAP based authentication
Vijay Singh created HIVE-11365: -- Summary: Enable Multifactor authentication in HiveServer2 for LDAP based authentication Key: HIVE-11365 URL: https://issues.apache.org/jira/browse/HIVE-11365 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 1.2.0 Reporter: Vijay Singh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-2843) UDAF to convert an aggregation to a map
[ https://issues.apache.org/jira/browse/HIVE-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay Ratnagiri updated HIVE-2843: -- Labels: UDAF features udf (was: features udf) > UDAF to convert an aggregation to a map > --- > > Key: HIVE-2843 > URL: https://issues.apache.org/jira/browse/HIVE-2843 > Project: Hive > Issue Type: New Feature > Components: UDF >Affects Versions: 0.9.0, 0.10.0 >Reporter: David Worms >Priority: Minor > Labels: UDAF, features, udf > Attachments: HIVE-2843.1.patch.txt, HIVE-2843.D8745.1.patch, > hive-2843-dev.git.patch > > > I propose the addition of two new Hive UDAF to help with maps in Apache Hive. > The source code is available on GitHub at https://github.com/wdavidw/hive-udf > in two Java classes: "UDAFToMap" and "UDAFToOrderedMap". The first function > convert an aggregation into a map and is internally using a Java `HashMap`. > The second function extends the first one. It convert an aggregation into an > ordered map and is internally using a Java `TreeMap`. They both extends the > `AbstractGenericUDAFResolver` class. > Also, I have covered the motivations and usages of those UDAF in a blog post > at http://adaltas.com/blog/2012/03/06/hive-udaf-map-conversion/ > The full patch is available with tests as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-6352) Hive 0.11 & Hive 0.12 - derby.log and TempStatsStore are created when using MySQL Metastore
[ https://issues.apache.org/jira/browse/HIVE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naga Vijay resolved HIVE-6352. -- Resolution: Not A Problem Not a Problem, as we can avoid creation of derby.log and TempStatsStore by setting hive.stats.autogather to false. > Hive 0.11 & Hive 0.12 - derby.log and TempStatsStore are created when using > MySQL Metastore > --- > > Key: HIVE-6352 > URL: https://issues.apache.org/jira/browse/HIVE-6352 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.12.0 > Environment: hadoop 1.2.1 > hive 0.12.0 >Reporter: Naga Vijay > Attachments: th.zip > > > Hi, > I am facing this situation for the below mentioned hql file - > > Hive MySQL Metastore is used for table t1 > > Hive MySQL Metastore is not used for table t2 (derby.log is created in the > > directory) > -- > -- database pp_test_hive_metastore > drop database if exists pp_test_hive_metastore cascade; > create database pp_test_hive_metastore; > use pp_test_hive_metastore; > -- table t1 > create table t1 ( id int, name string ); > LOAD DATA LOCAL INPATH 'testHiveMetastore.txt' OVERWRITE INTO TABLE t1; > select * from t1; > select count(*) from t1; > -- table t2 > create table t2 ( id int, name string ); > INSERT OVERWRITE TABLE t2 SELECT t.* FROM t1 t; > select * from t2; > select count(*) from t2; > -- done > quit; > --- > Testing Procedure : Comment/Uncomment the lines for table t2 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6352) Hive 0.11 & Hive 0.12 - derby.log and TempStatsStore are created when using MySQL Metastore
[ https://issues.apache.org/jira/browse/HIVE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naga Vijay updated HIVE-6352: - Summary: Hive 0.11 & Hive 0.12 - derby.log and TempStatsStore are created when using MySQL Metastore (was: Hive 0.11 & Hive 0.12 - derby.log and TempStatsStore are created) > Hive 0.11 & Hive 0.12 - derby.log and TempStatsStore are created when using > MySQL Metastore > --- > > Key: HIVE-6352 > URL: https://issues.apache.org/jira/browse/HIVE-6352 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.12.0 > Environment: hadoop 1.2.1 > hive 0.12.0 >Reporter: Naga Vijay > Attachments: th.zip > > > Hi, > I am facing this situation for the below mentioned hql file - > > Hive MySQL Metastore is used for table t1 > > Hive MySQL Metastore is not used for table t2 (derby.log is created in the > > directory) > -- > -- database pp_test_hive_metastore > drop database if exists pp_test_hive_metastore cascade; > create database pp_test_hive_metastore; > use pp_test_hive_metastore; > -- table t1 > create table t1 ( id int, name string ); > LOAD DATA LOCAL INPATH 'testHiveMetastore.txt' OVERWRITE INTO TABLE t1; > select * from t1; > select count(*) from t1; > -- table t2 > create table t2 ( id int, name string ); > INSERT OVERWRITE TABLE t2 SELECT t.* FROM t1 t; > select * from t2; > select count(*) from t2; > -- done > quit; > --- > Testing Procedure : Comment/Uncomment the lines for table t2 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6352) Hive 0.11 & Hive 0.12 - derby.log and TempStatsStore are created
[ https://issues.apache.org/jira/browse/HIVE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naga Vijay updated HIVE-6352: - Attachment: th.zip Steps to test ... 0. have a proper hadoop, hive (with mysql metastore) setup 1. unzip attached th.zip 2. run bash script in th12a dir - notice derby.log and TempStatsStore dir creation in th12a dir - this is the issue 3. run bash script in th12b dir - to verify table t2 has data in it (you can verify in HDFS as well) Issue is - derby.log and TempStatsStore dir are created when using mysql metastore. > Hive 0.11 & Hive 0.12 - derby.log and TempStatsStore are created > > > Key: HIVE-6352 > URL: https://issues.apache.org/jira/browse/HIVE-6352 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.12.0 > Environment: hadoop 1.2.1 > hive 0.12.0 >Reporter: Naga Vijay > Attachments: th.zip > > > Hi, > I am facing this situation for the below mentioned hql file - > > Hive MySQL Metastore is used for table t1 > > Hive MySQL Metastore is not used for table t2 (derby.log is created in the > > directory) > -- > -- database pp_test_hive_metastore > drop database if exists pp_test_hive_metastore cascade; > create database pp_test_hive_metastore; > use pp_test_hive_metastore; > -- table t1 > create table t1 ( id int, name string ); > LOAD DATA LOCAL INPATH 'testHiveMetastore.txt' OVERWRITE INTO TABLE t1; > select * from t1; > select count(*) from t1; > -- table t2 > create table t2 ( id int, name string ); > INSERT OVERWRITE TABLE t2 SELECT t.* FROM t1 t; > select * from t2; > select count(*) from t2; > -- done > quit; > --- > Testing Procedure : Comment/Uncomment the lines for table t2 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6352) Hive 0.11 & Hive 0.12 - derby.log and TempStatsStore are created
[ https://issues.apache.org/jira/browse/HIVE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889133#comment-13889133 ] Naga Vijay commented on HIVE-6352: -- Modified the title of this JIRA to reflect the issue after further testing. Here's the test setup - > find . -type f ./th12a/testHiveMetastore.hql ./th12a/testHiveMetastore.sh ./th12a/testHiveMetastore.txt ./th12b/testHiveMetastore.hql ./th12b/testHiveMetastore.sh > Ran script in th12a dir - noticed derby.log and TempStatsStore dir being > created in th12a dir > Ran script in th12b dir - table t2 has data, so MySQL Metastore is used ; > verified in HDFS as well So, the issue is - derby.log and TempStatsStore dir are created in th12a dir when using MySQL Metastore > Hive 0.11 & Hive 0.12 - derby.log and TempStatsStore are created > > > Key: HIVE-6352 > URL: https://issues.apache.org/jira/browse/HIVE-6352 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.12.0 > Environment: hadoop 1.2.1 > hive 0.12.0 >Reporter: Naga Vijay > > Hi, > I am facing this situation for the below mentioned hql file - > > Hive MySQL Metastore is used for table t1 > > Hive MySQL Metastore is not used for table t2 (derby.log is created in the > > directory) > -- > -- database pp_test_hive_metastore > drop database if exists pp_test_hive_metastore cascade; > create database pp_test_hive_metastore; > use pp_test_hive_metastore; > -- table t1 > create table t1 ( id int, name string ); > LOAD DATA LOCAL INPATH 'testHiveMetastore.txt' OVERWRITE INTO TABLE t1; > select * from t1; > select count(*) from t1; > -- table t2 > create table t2 ( id int, name string ); > INSERT OVERWRITE TABLE t2 SELECT t.* FROM t1 t; > select * from t2; > select count(*) from t2; > -- done > quit; > --- > Testing Procedure : Comment/Uncomment the lines for table t2 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6352) Hive 0.11 & Hive 0.12 - derby.log and TempStatsStore are created
[ https://issues.apache.org/jira/browse/HIVE-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naga Vijay updated HIVE-6352: - Summary: Hive 0.11 & Hive 0.12 - derby.log and TempStatsStore are created (was: Hive 0.11 & Hive 0.12 - Hive MySQL Metastore is not used for INSERT) > Hive 0.11 & Hive 0.12 - derby.log and TempStatsStore are created > > > Key: HIVE-6352 > URL: https://issues.apache.org/jira/browse/HIVE-6352 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 0.12.0 > Environment: hadoop 1.2.1 > hive 0.12.0 >Reporter: Naga Vijay > > Hi, > I am facing this situation for the below mentioned hql file - > > Hive MySQL Metastore is used for table t1 > > Hive MySQL Metastore is not used for table t2 (derby.log is created in the > > directory) > -- > -- database pp_test_hive_metastore > drop database if exists pp_test_hive_metastore cascade; > create database pp_test_hive_metastore; > use pp_test_hive_metastore; > -- table t1 > create table t1 ( id int, name string ); > LOAD DATA LOCAL INPATH 'testHiveMetastore.txt' OVERWRITE INTO TABLE t1; > select * from t1; > select count(*) from t1; > -- table t2 > create table t2 ( id int, name string ); > INSERT OVERWRITE TABLE t2 SELECT t.* FROM t1 t; > select * from t2; > select count(*) from t2; > -- done > quit; > --- > Testing Procedure : Comment/Uncomment the lines for table t2 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6352) Hive 0.11 & Hive 0.12 - Hive MySQL Metastore is not used for INSERT
Naga Vijay created HIVE-6352: Summary: Hive 0.11 & Hive 0.12 - Hive MySQL Metastore is not used for INSERT Key: HIVE-6352 URL: https://issues.apache.org/jira/browse/HIVE-6352 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.12.0 Environment: hadoop 1.2.1 hive 0.12.0 Reporter: Naga Vijay Hi, I am facing this situation for the below mentioned hql file - > Hive MySQL Metastore is used for table t1 > Hive MySQL Metastore is not used for table t2 (derby.log is created in the > directory) -- -- database pp_test_hive_metastore drop database if exists pp_test_hive_metastore cascade; create database pp_test_hive_metastore; use pp_test_hive_metastore; -- table t1 create table t1 ( id int, name string ); LOAD DATA LOCAL INPATH 'testHiveMetastore.txt' OVERWRITE INTO TABLE t1; select * from t1; select count(*) from t1; -- table t2 create table t2 ( id int, name string ); INSERT OVERWRITE TABLE t2 SELECT t.* FROM t1 t; select * from t2; select count(*) from t2; -- done quit; --- Testing Procedure : Comment/Uncomment the lines for table t2 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-1975) "insert overwrite directory" Not able to insert data with multi level directory path
[ https://issues.apache.org/jira/browse/HIVE-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831862#comment-13831862 ] Vijay Ratnagiri commented on HIVE-1975: --- Hey Guys, I'm using hive 0.11.0 and I just verified that I'm facing this exact problem. I first tried to ask hive to create a multilevel path and I got: "return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask" When I switched to using a simple one level directory, my query succeded and I could seem my data written out. Can anyone else corroborate? Thanks! > "insert overwrite directory" Not able to insert data with multi level > directory path > > > Key: HIVE-1975 > URL: https://issues.apache.org/jira/browse/HIVE-1975 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.5.0 > Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise > Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5). >Reporter: Chinna Rao Lalam >Assignee: Chinna Rao Lalam > Fix For: 0.8.0 > > Attachments: HIVE-1975.1.patch, HIVE-1975.2.patch, HIVE-1975.3.patch, > HIVE-1975.patch > > > Below query execution is failed > Ex: > {noformat} >insert overwrite directory '/HIVEFT25686/chinna/' select * from dept_j; > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3682) when output hive table to file,users should could have a separator of their own choice
[ https://issues.apache.org/jira/browse/HIVE-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831625#comment-13831625 ] Vijay Ratnagiri commented on HIVE-3682: --- Hey Guys, I was really delighted to find that the export finally supported choosing the format, but unfortunately, my delight was short lived when I discovered thet this feature is supported only for 'insert overwrite LOCAL directory' and not when I'm exporting to an HDFS directory. I get a syntax/parse error when I try to export to an HDFS directory with a custom row format. How come this feature was implimented like this? If this wasn't intentional, then, does this warrant reopening this ticket? Thanks! > when output hive table to file,users should could have a separator of their > own choice > -- > > Key: HIVE-3682 > URL: https://issues.apache.org/jira/browse/HIVE-3682 > Project: Hive > Issue Type: New Feature > Components: CLI >Affects Versions: 0.8.1 > Environment: Linux 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 > 20:34:47 UTC 2011 i686 i686 i386 GNU/Linux > java version "1.6.0_25" > hadoop-0.20.2-cdh3u0 > hive-0.8.1 >Reporter: caofangkun >Assignee: Sushanth Sowmyan > Fix For: 0.11.0 > > Attachments: HIVE-3682-1.patch, HIVE-3682.D10275.1.patch, > HIVE-3682.D10275.2.patch, HIVE-3682.D10275.3.patch, HIVE-3682.D10275.4.patch, > HIVE-3682.D10275.4.patch.for.0.11, HIVE-3682.with.serde.patch > > > By default,when output hive table to file ,columns of the Hive table are > separated by ^A character (that is \001). > But indeed users should have the right to set a seperator of their own choice. > Usage Example: > create table for_test (key string, value string); > load data local inpath './in1.txt' into table for_test > select * from for_test; > UT-01:default separator is \001 line separator is \n > insert overwrite local directory './test-01' > select * from src ; > create table array_table (a array, b array) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > COLLECTION ITEMS TERMINATED BY ','; > load data local inpath "../hive/examples/files/arraytest.txt" overwrite into > table table2; > CREATE TABLE map_table (foo STRING , bar MAP) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\t' > COLLECTION ITEMS TERMINATED BY ',' > MAP KEYS TERMINATED BY ':' > STORED AS TEXTFILE; > UT-02:defined field separator as ':' > insert overwrite local directory './test-02' > row format delimited > FIELDS TERMINATED BY ':' > select * from src ; > UT-03: line separator DO NOT ALLOWED to define as other separator > insert overwrite local directory './test-03' > row format delimited > FIELDS TERMINATED BY ':' > select * from src ; > UT-04: define map separators > insert overwrite local directory './test-04' > row format delimited > FIELDS TERMINATED BY '\t' > COLLECTION ITEMS TERMINATED BY ',' > MAP KEYS TERMINATED BY ':' > select * from src; -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: Problem with non-generic UDAFs and sub queries
I digged into this more and it doesn't look like this has anything to do with my own UDAFs. All the built-in UDAFs seem to have the same problem. The query below always fails with the exception below. select avg(c) from (select c from test) tmp; Unless I'm missing something obvious, this seems like a serious bug to me. Thanks, Vijay On Fri, Jul 29, 2011 at 1:55 PM, Vijay wrote: > Hi, I'm not sure if this is a know problem but when I use a > non-generic UDAF (either the examples under contrib or my own) within > a simple query they work fine but if I use them over a column from a > subquery, execution fails with the exceptions below. > > Working query: select myavg(players) from test; > Failing query: select myavg(players) from (select players from test > order by day) tmp; > > java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:426) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 5 more > Caused by: java.lang.RuntimeException: Reduce operator initialization failed > at > org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:157) > ... 10 more > Caused by: java.lang.RuntimeException: cannot find field value from [0:_col0] > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321) > at > org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119) > at > org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:252) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:62) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:62) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) > at > org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) > at > org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40) > at > org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) > at > org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:150) > ... 10 more > > Any help is appreciated! >
Problem with non-generic UDAFs and sub queries
Hi, I'm not sure if this is a know problem but when I use a non-generic UDAF (either the examples under contrib or my own) within a simple query they work fine but if I use them over a column from a subquery, execution fails with the exceptions below. Working query: select myavg(players) from test; Failing query: select myavg(players) from (select players from test order by day) tmp; java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:426) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:157) ... 10 more Caused by: java.lang.RuntimeException: cannot find field value from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82) at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:252) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:62) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:62) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:389) at org.apache.hadoop.hive.ql.exec.ExtractOperator.initializeOp(ExtractOperator.java:40) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) at org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:150) ... 10 more Any help is appreciated!
Questions about Hive Database/Schema support
Hive 0.6 has support for multiple databases/schemas. Is this feature mature enough to be used in production? Are there any particular features known to not work with databases (I know you cannot run queries using multiple databases at the same time)? Currently, there doesn't seem to be an easy way to move existing tables into a new database from CLI but it should be possible to do this directly by modifying the metastore right? Is there anything to watch out for? Thanks, Vijay
Question about Transform and M/R scripts
Hi, I'm trying this use case: do a simple select from an existing table and pass the results through a reduce script to do some analysis. The table has web logs so the select uses a pseudo user ID as the key and the rest of the data as values. My expectation is that a single reduce script should receive all logs for a given user so that I can do some path based analysis. Are there any issues with this idea so far? When I try it though, hive is not doing what I'd expect. The particular query is not generating any reduce tasks at all. Here's a sample query: FROM( SELECT userid, time, url FROM weblogs ) weblogs reduce weblogs.userid, weblogs.time, weblogs.url using 'counter.pl' as user, count; Thanks, Vijay
Re: Hive queries consuming 100% cpu
Sorry i should've given more details. The query was limited by a partition range; I just omitted the WHERE clause in the mail. The table is not that big. For each day, there is one gzipped file. The largest file is about 250MB (close to 2GB uncompressed). I did intend to count and that was just to test since I wanted to run a query that did the most minimal logic/processing. Here's a test I ran now. The query is getting count(1) for 8 days. It spawned 8 maps as expected. The maps run for anywhere between 42 to 69 seconds (which may or may not be right; I need to check that). It spawned only one reduce task. The reducer ran for 117 seconds, which seems long for this query. On Thu, Feb 3, 2011 at 2:31 PM, Viral Bajaria wrote: > Hey Vijay, > You can go to the mapred ui, normally it runs on port 50030 of the namenode > and see how many map jobs got created for your submitted query. > You said that the events table has daily partitions but the example query > that you have does not prune the partitions by specifying a WHERE clause. So > I have the following questions > 1) how big is the table (you can just do a hadoop dfs -dus > ? how many partitions ? > 2) do you really intend to count the number of events across all days ? > 3) could you build a query which computes over 1-5 day(s) and persists the > data in a separate table for consumption later on ? > Based on your node configuration, I am just guessing the amount of data to > process is too large and hence the high CPU. > Thanks, > Viral > On Thu, Feb 3, 2011 at 12:49 PM, Vijay wrote: >> >> Hi, >> >> The simplest of hive queries seem to be consuming 100% cpu. This is >> with a small 4-node cluster. The machines are pretty beefy (16 cores >> per machine, tons of RAM, 16 M+R maximum tasks configured, 1GB RAM for >> mapred.child.java.opts, etc). A simple query like "select count(1) >> from events" where the events table has daily partitions of log files >> in gzipped file format). While this is probably too generic a question >> and there is a bunch of investigation we need to, are there any >> specific areas for me to look at? Has anyone see anything like this >> before? Also, are there any tools or easy options to profile hive >> query execution? >> >> Thanks in advance, >> Vijay > >
Hive queries consuming 100% cpu
Hi, The simplest of hive queries seem to be consuming 100% cpu. This is with a small 4-node cluster. The machines are pretty beefy (16 cores per machine, tons of RAM, 16 M+R maximum tasks configured, 1GB RAM for mapred.child.java.opts, etc). A simple query like "select count(1) from events" where the events table has daily partitions of log files in gzipped file format). While this is probably too generic a question and there is a bunch of investigation we need to, are there any specific areas for me to look at? Has anyone see anything like this before? Also, are there any tools or easy options to profile hive query execution? Thanks in advance, Vijay
Hive storage handler using JDBC
Hi, The storage handler mechanism seems like an excellent way to support mixing hive with a traditional database using a generic JDBC storage handler. While that may not always be the best thing to do, is there any work targeted at this integration? Are there any issues or problems preventing such an integration? Any ideas/suggestions for implementation are also welcome! P.S. I think I've been posting this to a wrong alias and never saw a response. Sorry if you've already seen it. Thanks, Vijay
Storage Handler using JDBC
The storage handler mechanism seems like an excellent way to support mixing hive with a traditional database using a generic JDBC storage handler. While that may not always be the best thing to do, is there any work targeted at this integration? Are there any issues or problems preventing such an integration? Thanks, Vijay