hyphen in hive struct field

2015-03-25 Thread Udit Mehta
Hi,

I have a hive table query:

create external table test (field1 struct `inner-table` : string);

I believe hyphens are disallowed  but to overcome that i read that we can
use ``(ticks) around them. But even this seems to fail.

Is there a way around this or hypens are not allowed in nested hive tables?

Thanks,
Udit


Can WebHCat show non-MapReduce jobs?

2015-03-25 Thread Xiaoyong Zhu
It seems that WebHCat could only show the Map Reduce jobs - for example, if I 
submit a Hive on Tez job via WebHCat, I can only get the TempletonControllerJob 
ID (which is a MAPREDUCE job) but I cannot get the Tez job ID (which is 
launched by TempletonControllerJob).

Is this by design? Is there a way to return all type of jobs via WebHCat?

Xiaoyong



RE: Executing HQL files from JAVA application.

2015-03-25 Thread Amal Gupta
Hi Gopal,

Thanks a lot. 
Connectivity to the HiveServer2 was not an issue. We were able to connect using 
the example that you shared and using Beeline. The issue is a script execution 
from java app. May be I missed something, but I was not able to find an 
efficient and elegant way to execute hive scripts placed at a specific location 
from the java app.  

The scenario is 
App Placed at a location A should be able to connect to hiveServer2 at B and 
execute script TestHive.hql placed at a location on B(say 
root/testProject/hive/scripts/TestHive.hql).

Regards,
Amal

-Original Message-
From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal 
Vijayaraghavan
Sent: Wednesday, March 25, 2015 8:49 AM
To: user@hive.apache.org
Cc: Amal Gupta
Subject: Re: Executing HQL files from JAVA application.

Hi,

Any mechanism which bypasses schema layers for SQL is a bad idea.

See this example for how you can connect to HiveServer2 directly from Java
-
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveSe
rver2Clients-JDBCClientSampleCode

Use the JDBC driver to access HiveServer2 through a first-class Java API.

If you find any performance issues with this method, let me know and I can fix 
it.

Cheers,
Gopal

From:  Amal Gupta amal.gup...@aexp.com
Reply-To:  user@hive.apache.org user@hive.apache.org
Date:  Sunday, March 22, 2015 at 10:53 PM
To:  user@hive.apache.org user@hive.apache.org
Subject:  RE: Executing HQL files from JAVA application.


Hey Mich,

 
Got any clues regarding the failure of the code that I sent?
 
I was going through the project and the code again and I suspect the 
mis-matching dependencies to be the culprits. I am currently trying to re-align 
the dependencies  as per the pom given on the mvnrepository.com while trying to 
see if a particular configuration succeeds.
 
Will keep you posted on my progress.  Thanks again for all the help that you 
are providing.
J
 
Regards,
Amal

 
From: Amal Gupta

Sent: Sunday, March 22, 2015 7:52 AM
To: user@hive.apache.org
Subject: RE: Executing HQL files from JAVA application.


 
Hi Mich,
 
J A coincidence. Even I am new to hive. My test script which I am trying to 
execute  contains a drop and a create statement.
 
Script :-

use test_db;
DROP TABLE IF EXISTS demoHiveTable;
CREATE EXTERNAL TABLE demoHiveTable (
demoId string,
demoName string
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'

STORED AS TEXTFILE LOCATION '/hive/';
 
 
Java Code: -
Not sure whether this will have an impact but the the code is a part of Spring 
batch Tasklet being triggered from the Batch-Context. This tasklet runs in 
parallel with other tasklets.
 
  
public RepeatStatus execute(StepContribution arg0, ChunkContext arg1)
   
throws Exception {
String[] args =
{-d,BeeLine.BEELINE_DEFAULT_JDBC_DRIVER,-u,jdbc:hive2://server-name:
1/test_db,
   
-n,**,-p,**,
-f,C://Work//test_hive.hql};

BeeLine
beeline = new BeeLine();
ByteArrayOutputStream os = new ByteArrayOutputStream();
PrintStream beelineOutputStream = new PrintStream(os);
beeline.setOutputStream(beelineOutputStream);
beeline.setErrorStream(beelineOutputStream);
beeline.begin(args,null);
String output = os.toString(UTF8);
System.out.println(output);

return RepeatStatus.FINISHED;
   }
 
It will be great if you can share the piece of code that worked for you.
May be it will give me some pointers on how to go ahead.

 
Best Regards,
Amal

 
From: Mich Talebzadeh [mailto:m...@peridale.co.uk]

Sent: Sunday, March 22, 2015 2:58 AM
To: user@hive.apache.org
Subject: RE: Executing HQL files from JAVA application.


 
Hi Amal;
 
Me coming from relational database (Oracle, Sybase) background J always expect 
that a DDL statement like DROP TABLE has to run in its own transaction and 
cannot be  combined with a DML statement.
 
Now I suspect that when you run the command DROP TABLE IF EXIASTS TABLE_NAME; 
 like below in beehive it works
 
0: jdbc:hive2://rhes564:10010/default drop table if exists mytest; No rows 
affected (0.216 seconds)
 
That runs in its own transaction so it works. However, I suspect in JAVA that 
is not the case. Can you possibly provide your JAVA code to see what exactly it 
is  doing.
 
Thanks,
 
Mich
 
http://talebzadehmich.wordpress.com
 
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and 
Coherence Cache
 
NOTE: The information in this email is proprietary and confidential. This 
message is for the designated recipient only, if you are not the intended  
recipient, you should destroy it immediately. Any information in this message 
shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries 
or their employees, unless expressly 

Re: Can WebHCat show non-MapReduce jobs?

2015-03-25 Thread Eugene Koifman
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Jobs should 
produce all jobs (assuming the calling user has permissions to see them).
templeton.Server.showJobList() has detailed JavaDoc

From: Xiaoyong Zhu xiaoy...@microsoft.commailto:xiaoy...@microsoft.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Wednesday, March 25, 2015 at 5:35 AM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Can WebHCat show non-MapReduce jobs?

It seems that WebHCat could only show the Map Reduce jobs - for example, if I 
submit a Hive on Tez job via WebHCat, I can only get the TempletonControllerJob 
ID (which is a MAPREDUCE job) but I cannot get the Tez job ID (which is 
launched by TempletonControllerJob).

Is this by design? Is there a way to return all type of jobs via WebHCat?

Xiaoyong



How to read Protobuffers in Hive

2015-03-25 Thread Lukas Nalezenec

Hi,
I am trying to write Serde + ObjectInspectors for reading Protobuffers 
in Hive.
I tried to use class ProtocolBuffersStructObjectInspector from Hive but 
it last worked with old protobuffer version 2.3.
I tried to use ObjectInspector from Twitter Elephant-bird but it does 
not work too.


It looks like that it could help if I havent used DynamicMessage$Builder 
. The problem is that DynamicMessage$Builder cannot be reused.
When message is build from builder the builder field fields is set tu 
null and it throws NPE on second build() call.


...
at 
com.lukas.AbstractProtobufStructObjectInspector.setStructFieldData_default(AbstractProtobufStructObjectInspector.java:253)
at 
com.lukas.ProtobufStructObjectInspector.setStructFieldData(ProtobufStructObjectInspector.java:61)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:325)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:324)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:630)
... 9 more
Caused by: java.lang.NullPointerException
at 
com.google.protobuf.DynamicMessage$Builder.clearField(DynamicMessage.java:386)
at 
com.google.protobuf.DynamicMessage$Builder.clearField(DynamicMessage.java:252)
at 
com.lukas.AbstractProtobufStructObjectInspector.setStructFieldData_default(AbstractProtobufStructObjectInspector.java:177)
... 13 more



I am using Hive 10 but I am also interested in solution for Hive 13.

Does anybody have working Protobuffer Serde ?

Thanks
Best Regards

Lukas


Re: How to read Protobuffers in Hive

2015-03-25 Thread Edward Capriolo
You may be able to use:
https://github.com/edwardcapriolo/hive-protobuf

(Use the branch not master)

This code is based on the avro support. It works well even with nested
objects.




On Wed, Mar 25, 2015 at 12:28 PM, Lukas Nalezenec 
lukas.naleze...@firma.seznam.cz wrote:

  Hi,
 I am trying to write Serde + ObjectInspectors for reading Protobuffers in
 Hive.
 I tried to use class ProtocolBuffersStructObjectInspector from Hive but it
 last worked with old protobuffer version 2.3.
 I tried to use ObjectInspector from Twitter Elephant-bird but it does not
 work too.

 It looks like that it could help if I havent used DynamicMessage$Builder .
 The problem is that DynamicMessage$Builder cannot be reused.
 When message is build from builder the builder field fields is set tu
 null and it throws NPE on second build() call.

 ...
   at 
 com.lukas.AbstractProtobufStructObjectInspector.setStructFieldData_default(AbstractProtobufStructObjectInspector.java:253)
   at 
 com.lukas.ProtobufStructObjectInspector.setStructFieldData(ProtobufStructObjectInspector.java:61)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:325)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:324)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:630)
   ... 9 more
 Caused by: java.lang.NullPointerException
   at 
 com.google.protobuf.DynamicMessage$Builder.clearField(DynamicMessage.java:386)
   at 
 com.google.protobuf.DynamicMessage$Builder.clearField(DynamicMessage.java:252)
   at 
 com.lukas.AbstractProtobufStructObjectInspector.setStructFieldData_default(AbstractProtobufStructObjectInspector.java:177)
   ... 13 more



 I am using Hive 10 but I am also interested in solution for Hive 13.

 Does anybody have working Protobuffer Serde ?

 Thanks
 Best Regards

 Lukas



Re: Intermittent BindException during long MR jobs

2015-03-25 Thread Krishna Rao
Thanks for the responses. In our case the port is 0, and so from the link
http://wiki.apache.org/hadoop/BindException Ted mentioned it says that a
collision is highly unlikely:

If the port is 0, then the OS is looking for any free port -so the
port-in-use and port-below-1024 problems are highly unlikely to be the
cause of the problem.

I think load may be the culprit since the nodes will be heavily used during
the times that the exception occurs.

Is there anyway to set/increase the timeout for the call/connection
attempt? In all cases so far it seems to be on a call to delete a file in
HDFS. I had a search through the HDFS code base but couldn't see an obvious
way to set a timeout, and couldn't see it being set.

Krishna

On 28 February 2015 at 15:20, Ted Yu yuzhih...@gmail.com wrote:

 Krishna:
 Please take a look at:
 http://wiki.apache.org/hadoop/BindException

 Cheers

 On Thu, Feb 26, 2015 at 10:30 PM, hadoop.supp...@visolve.com wrote:

 Hello Krishna,



 Exception seems to be IP specific. It might be occurred due to
 unavailability of IP address in the system to assign. Double check the IP
 address availability and run the job.



 *Thanks,*

 *S.RagavendraGanesh*

 ViSolve Hadoop Support Team
 ViSolve Inc. | San Jose, California
 Website: www.visolve.com

 email: servi...@visolve.com | Phone: 408-850-2243





 *From:* Krishna Rao [mailto:krishnanj...@gmail.com]
 *Sent:* Thursday, February 26, 2015 9:48 PM
 *To:* user@hive.apache.org; u...@hadoop.apache.org
 *Subject:* Intermittent BindException during long MR jobs



 Hi,



 we occasionally run into a BindException causing long running jobs to
 occasionally fail.



 The stacktrace is below.



 Any ideas what this could be caused by?



 Cheers,



 Krishna





 Stacktrace:

 379969 [Thread-980] ERROR org.apache.hadoop.hive.ql.exec.Task  - Job
 Submission failed with exception 'java.net.BindException(Problem binding to
 [back10/10.4.2.10:0] java.net.BindException: Cann

 ot assign requested address; For more details see:
 http://wiki.apache.org/hadoop/BindException)'

 java.net.BindException: Problem binding to [back10/10.4.2.10:0]
 java.net.BindException: Cannot assign requested address; For more details
 see:  http://wiki.apache.org/hadoop/BindException

 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718)

 at org.apache.hadoop.ipc.Client.call(Client.java:1242)

 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)

 at com.sun.proxy.$Proxy10.create(Unknown Source)

 at
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:193)

 at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

 at java.lang.reflect.Method.invoke(Method.java:597)

 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)

 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)

 at com.sun.proxy.$Proxy11.create(Unknown Source)

 at
 org.apache.hadoop.hdfs.DFSOutputStream.init(DFSOutputStream.java:1376)

 at
 org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1395)

 at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1255)

 at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1212)

 at
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:276)

 at
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:265)

 at
 org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:82)

 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:888)

 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:869)

 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:768)

 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:757)

 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:558)

 at
 org.apache.hadoop.mapreduce.split.JobSplitWriter.createFile(JobSplitWriter.java:96)

 at
 org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:85)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:517)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:487)

 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:369)

 at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286)

 at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:396)

 at
 

Re: Executing HQL files from JAVA application.

2015-03-25 Thread Steve Howard
I would argue that executing arbitrary code from a random remote server has 
just increased your security scope footprint in terms of the need to control 
another access point.

Purely out of curiosity, is there a compelling architectural reason or 
environment limitation that results in your need to do it this way?

Sent from my iPad

 On Mar 25, 2015, at 9:57 AM, Amal Gupta amal.gup...@aexp.com wrote:
 
 Hi Gopal,
 
 Thanks a lot. 
 Connectivity to the HiveServer2 was not an issue. We were able to connect 
 using the example that you shared and using Beeline. The issue is a script 
 execution from java app. May be I missed something, but I was not able to 
 find an efficient and elegant way to execute hive scripts placed at a 
 specific location from the java app.  
 
 The scenario is 
 App Placed at a location A should be able to connect to hiveServer2 at B and 
 execute script TestHive.hql placed at a location on B(say 
 root/testProject/hive/scripts/TestHive.hql).
 
 Regards,
 Amal
 
 -Original Message-
 From: Gopal Vijayaraghavan [mailto:go...@hortonworks.com] On Behalf Of Gopal 
 Vijayaraghavan
 Sent: Wednesday, March 25, 2015 8:49 AM
 To: user@hive.apache.org
 Cc: Amal Gupta
 Subject: Re: Executing HQL files from JAVA application.
 
 Hi,
 
 Any mechanism which bypasses schema layers for SQL is a bad idea.
 
 See this example for how you can connect to HiveServer2 directly from Java
 -
 https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveSe
 rver2Clients-JDBCClientSampleCode
 
 Use the JDBC driver to access HiveServer2 through a first-class Java API.
 
 If you find any performance issues with this method, let me know and I can 
 fix it.
 
 Cheers,
 Gopal
 
 From:  Amal Gupta amal.gup...@aexp.com
 Reply-To:  user@hive.apache.org user@hive.apache.org
 Date:  Sunday, March 22, 2015 at 10:53 PM
 To:  user@hive.apache.org user@hive.apache.org
 Subject:  RE: Executing HQL files from JAVA application.
 
 
 Hey Mich,
 
 
 Got any clues regarding the failure of the code that I sent?
 
 I was going through the project and the code again and I suspect the 
 mis-matching dependencies to be the culprits. I am currently trying to 
 re-align the dependencies  as per the pom given on the mvnrepository.com 
 while trying to see if a particular configuration succeeds.
 
 Will keep you posted on my progress.  Thanks again for all the help that you 
 are providing.
 J
 
 Regards,
 Amal
 
 
 From: Amal Gupta
 
 Sent: Sunday, March 22, 2015 7:52 AM
 To: user@hive.apache.org
 Subject: RE: Executing HQL files from JAVA application.
 
 
 
 Hi Mich,
 
 J A coincidence. Even I am new to hive. My test script which I am trying to 
 execute  contains a drop and a create statement.
 
 Script :-
 
 use test_db;
 DROP TABLE IF EXISTS demoHiveTable;
 CREATE EXTERNAL TABLE demoHiveTable (
 demoId string,
 demoName string
 ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
 
 STORED AS TEXTFILE LOCATION '/hive/';
 
 
 Java Code: -
 Not sure whether this will have an impact but the the code is a part of 
 Spring batch Tasklet being triggered from the Batch-Context. This tasklet 
 runs in parallel with other tasklets.
 
 
 public RepeatStatus execute(StepContribution arg0, ChunkContext arg1)
 
 throws Exception {
String[] args =
 {-d,BeeLine.BEELINE_DEFAULT_JDBC_DRIVER,-u,jdbc:hive2://server-name:
 1/test_db,
 
 -n,**,-p,**,
 -f,C://Work//test_hive.hql};
 
BeeLine
 beeline = new BeeLine();
ByteArrayOutputStream os = new ByteArrayOutputStream();
PrintStream beelineOutputStream = new PrintStream(os);
beeline.setOutputStream(beelineOutputStream);
beeline.setErrorStream(beelineOutputStream);
beeline.begin(args,null);
String output = os.toString(UTF8);
System.out.println(output);
 
 return RepeatStatus.FINISHED;
   }
 
 It will be great if you can share the piece of code that worked for you.
 May be it will give me some pointers on how to go ahead.
 
 
 Best Regards,
 Amal
 
 
 From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
 
 Sent: Sunday, March 22, 2015 2:58 AM
 To: user@hive.apache.org
 Subject: RE: Executing HQL files from JAVA application.
 
 
 
 Hi Amal;
 
 Me coming from relational database (Oracle, Sybase) background J always 
 expect that a DDL statement like DROP TABLE has to run in its own transaction 
 and cannot be  combined with a DML statement.
 
 Now I suspect that when you run the command DROP TABLE IF EXIASTS 
 TABLE_NAME;  like below in beehive it works
 
 0: jdbc:hive2://rhes564:10010/default drop table if exists mytest; No rows 
 affected (0.216 seconds)
 
 That runs in its own transaction so it works. However, I suspect in JAVA that 
 is not the case. Can you possibly provide your JAVA code to see what exactly 
 it is  doing.
 
 Thanks,
 
 Mich
 
 

Re:

2015-03-25 Thread Alan Gates

If you want off of the list send email to user-unsubscr...@hive.apache.org

Alan.


jake lawson mailto:jacobj...@gmail.com
March 25, 2015 at 15:45
Stop emailing me


[no subject]

2015-03-25 Thread jake lawson
Stop emailing me


Re:

2015-03-25 Thread Al Pivonka
Why
Who's emailing you
On Mar 25, 2015 6:59 PM, Alan Gates alanfga...@gmail.com wrote:

 If you want off of the list send email to user-unsubscr...@hive.apache.org

 Alan.

   jake lawson jacobj...@gmail.com
  March 25, 2015 at 15:45
 Stop emailing me




RE: Can WebHCat show non-MapReduce jobs?

2015-03-25 Thread Xiaoyong Zhu
Thanks for the reply, Eugene. However, when I try to list the jobs via WebHCat 
(via /templeton/v1/jobs):

[{id:job_1427201295241_0001,detail:null},{id:job_1427201295241_0003,detail:null},{id:job_1427201295241_0005,detail:null}]
As you can see, there are three jobs there: 0001,0003 and 0005. However, I 
submitted three Hive on Tez jobs.

However, in YARN UI, I can see 6 jobs, where 0002, 0004, 0006 (which do not 
exist in WebHCat) are shown here.

[cid:image001.png@01D067A4.3C21F420]
Maybe something is wrong with my configurations?

Xiaoyong

From: Eugene Koifman [mailto:ekoif...@hortonworks.com]
Sent: Thursday, March 26, 2015 12:52 AM
To: user@hive.apache.org
Subject: Re: Can WebHCat show non-MapReduce jobs?

https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Jobs should 
produce all jobs (assuming the calling user has permissions to see them).
templeton.Server.showJobList() has detailed JavaDoc

From: Xiaoyong Zhu xiaoy...@microsoft.commailto:xiaoy...@microsoft.com
Reply-To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Date: Wednesday, March 25, 2015 at 5:35 AM
To: user@hive.apache.orgmailto:user@hive.apache.org 
user@hive.apache.orgmailto:user@hive.apache.org
Subject: Can WebHCat show non-MapReduce jobs?

It seems that WebHCat could only show the Map Reduce jobs - for example, if I 
submit a Hive on Tez job via WebHCat, I can only get the TempletonControllerJob 
ID (which is a MAPREDUCE job) but I cannot get the Tez job ID (which is 
launched by TempletonControllerJob).

Is this by design? Is there a way to return all type of jobs via WebHCat?

Xiaoyong