hyphen in hive struct field
Hi, I have a hive table query: create external table test (field1 struct `inner-table` : string); I believe hyphens are disallowed but to overcome that i read that we can use ``(ticks) around them. But even this seems to fail. Is there a way around this or hypens are not allowed in nested hive tables? Thanks, Udit
Can WebHCat show non-MapReduce jobs?
It seems that WebHCat could only show the Map Reduce jobs - for example, if I submit a Hive on Tez job via WebHCat, I can only get the TempletonControllerJob ID (which is a MAPREDUCE job) but I cannot get the Tez job ID (which is launched by TempletonControllerJob). Is this by design? Is there a way to return all type of jobs via WebHCat? Xiaoyong
RE: Executing HQL files from JAVA application.
Hi Gopal, Thanks a lot. Connectivity to the HiveServer2 was not an issue. We were able to connect using the example that you shared and using Beeline. The issue is a script execution from java app. May be I missed something, but I was not able to find an efficient and elegant way to execute hive scripts placed at a specific location from the java app. The scenario is App Placed at a location A should be able to connect to hiveServer2 at B and execute script TestHive.hql placed at a location on B(say root/testProject/hive/scripts/TestHive.hql). Regards, Amal -Original Message- From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal Vijayaraghavan Sent: Wednesday, March 25, 2015 8:49 AM To: user@hive.apache.org Cc: Amal Gupta Subject: Re: Executing HQL files from JAVA application. Hi, Any mechanism which bypasses schema layers for SQL is a bad idea. See this example for how you can connect to HiveServer2 directly from Java - https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveSe rver2Clients-JDBCClientSampleCode Use the JDBC driver to access HiveServer2 through a first-class Java API. If you find any performance issues with this method, let me know and I can fix it. Cheers, Gopal From: Amal Gupta amal.gup...@aexp.com Reply-To: user@hive.apache.org user@hive.apache.org Date: Sunday, March 22, 2015 at 10:53 PM To: user@hive.apache.org user@hive.apache.org Subject: RE: Executing HQL files from JAVA application. Hey Mich, Got any clues regarding the failure of the code that I sent? I was going through the project and the code again and I suspect the mis-matching dependencies to be the culprits. I am currently trying to re-align the dependencies as per the pom given on the mvnrepository.com while trying to see if a particular configuration succeeds. Will keep you posted on my progress. Thanks again for all the help that you are providing. J Regards, Amal From: Amal Gupta Sent: Sunday, March 22, 2015 7:52 AM To: user@hive.apache.org Subject: RE: Executing HQL files from JAVA application. Hi Mich, J A coincidence. Even I am new to hive. My test script which I am trying to execute contains a drop and a create statement. Script :- use test_db; DROP TABLE IF EXISTS demoHiveTable; CREATE EXTERNAL TABLE demoHiveTable ( demoId string, demoName string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/hive/'; Java Code: - Not sure whether this will have an impact but the the code is a part of Spring batch Tasklet being triggered from the Batch-Context. This tasklet runs in parallel with other tasklets. public RepeatStatus execute(StepContribution arg0, ChunkContext arg1) throws Exception { String[] args = {-d,BeeLine.BEELINE_DEFAULT_JDBC_DRIVER,-u,jdbc:hive2://server-name: 1/test_db, -n,**,-p,**, -f,C://Work//test_hive.hql}; BeeLine beeline = new BeeLine(); ByteArrayOutputStream os = new ByteArrayOutputStream(); PrintStream beelineOutputStream = new PrintStream(os); beeline.setOutputStream(beelineOutputStream); beeline.setErrorStream(beelineOutputStream); beeline.begin(args,null); String output = os.toString(UTF8); System.out.println(output); return RepeatStatus.FINISHED; } It will be great if you can share the piece of code that worked for you. May be it will give me some pointers on how to go ahead. Best Regards, Amal From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: Sunday, March 22, 2015 2:58 AM To: user@hive.apache.org Subject: RE: Executing HQL files from JAVA application. Hi Amal; Me coming from relational database (Oracle, Sybase) background J always expect that a DDL statement like DROP TABLE has to run in its own transaction and cannot be combined with a DML statement. Now I suspect that when you run the command DROP TABLE IF EXIASTS TABLE_NAME; like below in beehive it works 0: jdbc:hive2://rhes564:10010/default drop table if exists mytest; No rows affected (0.216 seconds) That runs in its own transaction so it works. However, I suspect in JAVA that is not the case. Can you possibly provide your JAVA code to see what exactly it is doing. Thanks, Mich http://talebzadehmich.wordpress.com Publications due shortly: Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly
Re: Can WebHCat show non-MapReduce jobs?
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Jobs should produce all jobs (assuming the calling user has permissions to see them). templeton.Server.showJobList() has detailed JavaDoc From: Xiaoyong Zhu xiaoy...@microsoft.commailto:xiaoy...@microsoft.com Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Wednesday, March 25, 2015 at 5:35 AM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Can WebHCat show non-MapReduce jobs? It seems that WebHCat could only show the Map Reduce jobs - for example, if I submit a Hive on Tez job via WebHCat, I can only get the TempletonControllerJob ID (which is a MAPREDUCE job) but I cannot get the Tez job ID (which is launched by TempletonControllerJob). Is this by design? Is there a way to return all type of jobs via WebHCat? Xiaoyong
How to read Protobuffers in Hive
Hi, I am trying to write Serde + ObjectInspectors for reading Protobuffers in Hive. I tried to use class ProtocolBuffersStructObjectInspector from Hive but it last worked with old protobuffer version 2.3. I tried to use ObjectInspector from Twitter Elephant-bird but it does not work too. It looks like that it could help if I havent used DynamicMessage$Builder . The problem is that DynamicMessage$Builder cannot be reused. When message is build from builder the builder field fields is set tu null and it throws NPE on second build() call. ... at com.lukas.AbstractProtobufStructObjectInspector.setStructFieldData_default(AbstractProtobufStructObjectInspector.java:253) at com.lukas.ProtobufStructObjectInspector.setStructFieldData(ProtobufStructObjectInspector.java:61) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:325) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:324) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:630) ... 9 more Caused by: java.lang.NullPointerException at com.google.protobuf.DynamicMessage$Builder.clearField(DynamicMessage.java:386) at com.google.protobuf.DynamicMessage$Builder.clearField(DynamicMessage.java:252) at com.lukas.AbstractProtobufStructObjectInspector.setStructFieldData_default(AbstractProtobufStructObjectInspector.java:177) ... 13 more I am using Hive 10 but I am also interested in solution for Hive 13. Does anybody have working Protobuffer Serde ? Thanks Best Regards Lukas
Re: How to read Protobuffers in Hive
You may be able to use: https://github.com/edwardcapriolo/hive-protobuf (Use the branch not master) This code is based on the avro support. It works well even with nested objects. On Wed, Mar 25, 2015 at 12:28 PM, Lukas Nalezenec lukas.naleze...@firma.seznam.cz wrote: Hi, I am trying to write Serde + ObjectInspectors for reading Protobuffers in Hive. I tried to use class ProtocolBuffersStructObjectInspector from Hive but it last worked with old protobuffer version 2.3. I tried to use ObjectInspector from Twitter Elephant-bird but it does not work too. It looks like that it could help if I havent used DynamicMessage$Builder . The problem is that DynamicMessage$Builder cannot be reused. When message is build from builder the builder field fields is set tu null and it throws NPE on second build() call. ... at com.lukas.AbstractProtobufStructObjectInspector.setStructFieldData_default(AbstractProtobufStructObjectInspector.java:253) at com.lukas.ProtobufStructObjectInspector.setStructFieldData(ProtobufStructObjectInspector.java:61) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:325) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:324) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:630) ... 9 more Caused by: java.lang.NullPointerException at com.google.protobuf.DynamicMessage$Builder.clearField(DynamicMessage.java:386) at com.google.protobuf.DynamicMessage$Builder.clearField(DynamicMessage.java:252) at com.lukas.AbstractProtobufStructObjectInspector.setStructFieldData_default(AbstractProtobufStructObjectInspector.java:177) ... 13 more I am using Hive 10 but I am also interested in solution for Hive 13. Does anybody have working Protobuffer Serde ? Thanks Best Regards Lukas
Re: Intermittent BindException during long MR jobs
Thanks for the responses. In our case the port is 0, and so from the link http://wiki.apache.org/hadoop/BindException Ted mentioned it says that a collision is highly unlikely: If the port is 0, then the OS is looking for any free port -so the port-in-use and port-below-1024 problems are highly unlikely to be the cause of the problem. I think load may be the culprit since the nodes will be heavily used during the times that the exception occurs. Is there anyway to set/increase the timeout for the call/connection attempt? In all cases so far it seems to be on a call to delete a file in HDFS. I had a search through the HDFS code base but couldn't see an obvious way to set a timeout, and couldn't see it being set. Krishna On 28 February 2015 at 15:20, Ted Yu yuzhih...@gmail.com wrote: Krishna: Please take a look at: http://wiki.apache.org/hadoop/BindException Cheers On Thu, Feb 26, 2015 at 10:30 PM, hadoop.supp...@visolve.com wrote: Hello Krishna, Exception seems to be IP specific. It might be occurred due to unavailability of IP address in the system to assign. Double check the IP address availability and run the job. *Thanks,* *S.RagavendraGanesh* ViSolve Hadoop Support Team ViSolve Inc. | San Jose, California Website: www.visolve.com email: servi...@visolve.com | Phone: 408-850-2243 *From:* Krishna Rao [mailto:krishnanj...@gmail.com] *Sent:* Thursday, February 26, 2015 9:48 PM *To:* user@hive.apache.org; u...@hadoop.apache.org *Subject:* Intermittent BindException during long MR jobs Hi, we occasionally run into a BindException causing long running jobs to occasionally fail. The stacktrace is below. Any ideas what this could be caused by? Cheers, Krishna Stacktrace: 379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task - Job Submission failed with exception 'java.net.BindException(Problem binding to [back10/10.4.2.10:0] java.net.BindException: Cann ot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException)' java.net.BindException: Problem binding to [back10/10.4.2.10:0] java.net.BindException: Cannot assign requested address; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718) at org.apache.hadoop.ipc.Client.call(Client.java:1242) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy10.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193) at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy11.create(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.init(DFSOutputStream.java:1376) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558) at org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96) at org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at
Re: Executing HQL files from JAVA application.
I would argue that executing arbitrary code from a random remote server has just increased your security scope footprint in terms of the need to control another access point. Purely out of curiosity, is there a compelling architectural reason or environment limitation that results in your need to do it this way? Sent from my iPad On Mar 25, 2015, at 9:57 AM, Amal Gupta amal.gup...@aexp.com wrote: Hi Gopal, Thanks a lot. Connectivity to the HiveServer2 was not an issue. We were able to connect using the example that you shared and using Beeline. The issue is a script execution from java app. May be I missed something, but I was not able to find an efficient and elegant way to execute hive scripts placed at a specific location from the java app. The scenario is App Placed at a location A should be able to connect to hiveServer2 at B and execute script TestHive.hql placed at a location on B(say root/testProject/hive/scripts/TestHive.hql). Regards, Amal -Original Message- From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal Vijayaraghavan Sent: Wednesday, March 25, 2015 8:49 AM To: user@hive.apache.org Cc: Amal Gupta Subject: Re: Executing HQL files from JAVA application. Hi, Any mechanism which bypasses schema layers for SQL is a bad idea. See this example for how you can connect to HiveServer2 directly from Java - https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveSe rver2Clients-JDBCClientSampleCode Use the JDBC driver to access HiveServer2 through a first-class Java API. If you find any performance issues with this method, let me know and I can fix it. Cheers, Gopal From: Amal Gupta amal.gup...@aexp.com Reply-To: user@hive.apache.org user@hive.apache.org Date: Sunday, March 22, 2015 at 10:53 PM To: user@hive.apache.org user@hive.apache.org Subject: RE: Executing HQL files from JAVA application. Hey Mich, Got any clues regarding the failure of the code that I sent? I was going through the project and the code again and I suspect the mis-matching dependencies to be the culprits. I am currently trying to re-align the dependencies as per the pom given on the mvnrepository.com while trying to see if a particular configuration succeeds. Will keep you posted on my progress. Thanks again for all the help that you are providing. J Regards, Amal From: Amal Gupta Sent: Sunday, March 22, 2015 7:52 AM To: user@hive.apache.org Subject: RE: Executing HQL files from JAVA application. Hi Mich, J A coincidence. Even I am new to hive. My test script which I am trying to execute contains a drop and a create statement. Script :- use test_db; DROP TABLE IF EXISTS demoHiveTable; CREATE EXTERNAL TABLE demoHiveTable ( demoId string, demoName string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/hive/'; Java Code: - Not sure whether this will have an impact but the the code is a part of Spring batch Tasklet being triggered from the Batch-Context. This tasklet runs in parallel with other tasklets. public RepeatStatus execute(StepContribution arg0, ChunkContext arg1) throws Exception { String[] args = {-d,BeeLine.BEELINE_DEFAULT_JDBC_DRIVER,-u,jdbc:hive2://server-name: 1/test_db, -n,**,-p,**, -f,C://Work//test_hive.hql}; BeeLine beeline = new BeeLine(); ByteArrayOutputStream os = new ByteArrayOutputStream(); PrintStream beelineOutputStream = new PrintStream(os); beeline.setOutputStream(beelineOutputStream); beeline.setErrorStream(beelineOutputStream); beeline.begin(args,null); String output = os.toString(UTF8); System.out.println(output); return RepeatStatus.FINISHED; } It will be great if you can share the piece of code that worked for you. May be it will give me some pointers on how to go ahead. Best Regards, Amal From: Mich Talebzadeh [mailto:m...@peridale.co.uk] Sent: Sunday, March 22, 2015 2:58 AM To: user@hive.apache.org Subject: RE: Executing HQL files from JAVA application. Hi Amal; Me coming from relational database (Oracle, Sybase) background J always expect that a DDL statement like DROP TABLE has to run in its own transaction and cannot be combined with a DML statement. Now I suspect that when you run the command DROP TABLE IF EXIASTS TABLE_NAME; like below in beehive it works 0: jdbc:hive2://rhes564:10010/default drop table if exists mytest; No rows affected (0.216 seconds) That runs in its own transaction so it works. However, I suspect in JAVA that is not the case. Can you possibly provide your JAVA code to see what exactly it is doing. Thanks, Mich
Re:
If you want off of the list send email to user-unsubscr...@hive.apache.org Alan. jake lawson mailto:jacobj...@gmail.com March 25, 2015 at 15:45 Stop emailing me
[no subject]
Stop emailing me
Re:
Why Who's emailing you On Mar 25, 2015 6:59 PM, Alan Gates alanfga...@gmail.com wrote: If you want off of the list send email to user-unsubscr...@hive.apache.org Alan. jake lawson jacobj...@gmail.com March 25, 2015 at 15:45 Stop emailing me
RE: Can WebHCat show non-MapReduce jobs?
Thanks for the reply, Eugene. However, when I try to list the jobs via WebHCat (via /templeton/v1/jobs): [{id:job_1427201295241_0001,detail:null},{id:job_1427201295241_0003,detail:null},{id:job_1427201295241_0005,detail:null}] As you can see, there are three jobs there: 0001,0003 and 0005. However, I submitted three Hive on Tez jobs. However, in YARN UI, I can see 6 jobs, where 0002, 0004, 0006 (which do not exist in WebHCat) are shown here. [cid:image001.png@01D067A4.3C21F420] Maybe something is wrong with my configurations? Xiaoyong From: Eugene Koifman [mailto:ekoif...@hortonworks.com] Sent: Thursday, March 26, 2015 12:52 AM To: user@hive.apache.org Subject: Re: Can WebHCat show non-MapReduce jobs? https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Jobs should produce all jobs (assuming the calling user has permissions to see them). templeton.Server.showJobList() has detailed JavaDoc From: Xiaoyong Zhu xiaoy...@microsoft.commailto:xiaoy...@microsoft.com Reply-To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Date: Wednesday, March 25, 2015 at 5:35 AM To: user@hive.apache.orgmailto:user@hive.apache.org user@hive.apache.orgmailto:user@hive.apache.org Subject: Can WebHCat show non-MapReduce jobs? It seems that WebHCat could only show the Map Reduce jobs - for example, if I submit a Hive on Tez job via WebHCat, I can only get the TempletonControllerJob ID (which is a MAPREDUCE job) but I cannot get the Tez job ID (which is launched by TempletonControllerJob). Is this by design? Is there a way to return all type of jobs via WebHCat? Xiaoyong