Hive unwanted directories creation issue

2014-08-25 Thread Valluri, Sathish
We are creating external table in Hive and if the location path is not present 
in the HDFS say /testdata(as shown below), Hive is creating the '/testdata' 
dummy folder.

Is there any option in hive or any way to stop creating dummy directories if 
the location folder not exists.

So we end up  creating many unwanted dummy directories if the data not present 
on the HDFS for many partitions we add after creating table.



CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{ ') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/testdata/';



Regards

Sathish Valluri





Hive Avro union data access

2014-05-29 Thread Valluri, Sathish
Hi,

 

I have an Hive table created with 3 different union data types for alias_host 
column name as shown. (array,string, null).

 

CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type":
 [{

  "type" : "array",

  "items" : "string"

},"string","null"]}]}

') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/arrayTests';

 

How to access and query the contents of this table in where clause.

The queries below like these can be possible if the datatype is not union but 
when once I set the datatype as union the following queries are failing.

 

Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 
1000;

Error: Error while processing statement: FAILED: SemanticException [Error 
10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of 
EQUAL  is expected to a primitive type, but union is found 
(state=42000,code=10016) 

 

Can anyone suggest how to access and query the contents of union data types.

 

Regards

Sathish Valluri

 

 

 

 

 



smime.p7s
Description: S/MIME cryptographic signature


Hive Avro union data access

2014-05-29 Thread Valluri, Sathish
 

Hi,

 

I have an Hive table created with 3 different union data types for alias_host 
column name as shown. (array,string, null).

 

CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type":
 [{

  "type" : "array",

  "items" : "string"

},"string","null"]}]}

') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/arrayTests';

 

How to access and query the contents of this table in where clause.

The queries below like these can be possible if the datatype is not union but 
when once I set the datatype as union the following queries are failing.

 

Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 
1000;

Error: Error while processing statement: FAILED: SemanticException [Error 
10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of 
EQUAL  is expected to a primitive type, but union is found 
(state=42000,code=10016) 

 

Can anyone suggest how to access and query the contents of union data types.

 

Regards

Sathish Valluri

 

 

 

 

 



smime.p7s
Description: S/MIME cryptographic signature


Hive Avro union data access

2014-05-29 Thread Valluri, Sathish
Hi,

 

I have an Hive table created with 3 different union data types for alias_host 
column name as shown. (array,string, null).

 

CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type":
 [{

  "type" : "array",

  "items" : "string"

},"string","null"]}]}

') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/arrayTests';

 

How to access and query the contents of this table in where clause.

The queries below like these can be possible if the datatype is not union but 
when once I set the datatype as union the following queries are failing.

 

Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 
1000;

Error: Error while processing statement: FAILED: SemanticException [Error 
10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of 
EQUAL  is expected to a primitive type, but union is found 
(state=42000,code=10016) 

 

Can anyone suggest how to access and query the contents of union data types.

 

Regards

Sathish Valluri

 

 

 

 

 



smime.p7s
Description: S/MIME cryptographic signature


Hive Avro union data access

2014-05-29 Thread Valluri, Sathish
Hi,



I have an Hive table created with 3 different union data types for alias_host 
column name as shown. (array,string, null).



CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type":
 [{

  "type" : "array",

  "items" : "string"

},"string","null"]}]}

') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/arrayTests';



How to access and query the contents of this table in where clause.

The queries below like these can be possible if the datatype is not union but 
when once I set the datatype as union the following queries are failing.



Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 
1000;

Error: Error while processing statement: FAILED: SemanticException [Error 
10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of 
EQUAL  is expected to a primitive type, but union is found 
(state=42000,code=10016)



Can anyone suggest how to access and query the contents of union data types.



Regards

Sathish Valluri













Hive-exec package JDBC issue

2014-04-10 Thread Valluri, Sathish
Hi all,



We are trying to use Hive 0.12 JDBC driver for connecting to remote Hive Server 
and execute queries.

We found a strange issue with this driver that it has dependency on hive-exec 
package and hive-exec has internally packaged the com.google.common classes 
into this.(please check the hive-exec_contents.png attached for details)

Our application uses com.google.guava package which is also using the latest 
com.google.common as shown in the guava_contents.png file) for other purposes.



Can anyone suggest how to remove this com.google.x dependency from hive-exec 
and also anyone knows what is the reason for this kind of google packages 
dependencies on hive-exec.

Right now our application is not starting because of this conflict  and it's 
loading older com.google.common from hive-exec and failing method not found 
errors.



Regards

Sathish Valluri



<><>

Hive unwanted location directory

2014-03-06 Thread Valluri, Sathish
We are creating external table in Hive and if the location path is not present 
in the HDFS say /testdata(as shown below), Hive is creating the '/testdata' 
dummy folder.

Is there any option in hive or any way to stop creating dummy directories if 
the location folder not exists.

Our use case needs many temporary tables needs to be created dynamically and we 
are creating many unwanted dummy directories if the data not present on the 
HDFS.



CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{ ') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/testdata/';



Regards

Sathish Valluri





Create Table in Hive always throwing the error

2013-11-15 Thread Valluri, Sathish
When trying to create table in Hive I am always getting the following error 
always.

Not able to do any hive DDL operations,



FAILED: Error in metadata: MetaException(message:java.lang.RuntimeException: 
commitTransaction was called but openTransactionCalls = 0. This probably 
indicates that there are unbalanced calls to openTransaction/commitTransaction)

FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask



Can anyone help or give any idea on the reason for this failure always.



I have looked into the similar issue reported on Hive 
https://issues.apache.org/jira/browse/HIVE-4996 but this is looks like observed 
intermittently and not always.





Regards

Sathish Valluri



Any sugesstions java.io.IOException: Not a data file error

2013-10-30 Thread Valluri, Sathish
Resending after disabling security signing..



From: Valluri, Sathish [mailto:sathish.vall...@emc.com]
Sent: Wednesday, October 30, 2013 2:17 PM
To: user@hive.apache.org
Subject: Any sugesstions java.io.IOException: Not a data file error



Hi All,



Hive Mapreduce jobs failing with the following java.io.IOException: Not a data 
file error if there are files other than avro in the HDFS.

I have created a Hive external table as shown below,



CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES 
('avro.schema.literal'='{ ') STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 
'/testdata/';



Running select count(*) from testable;



When /testdata contains avro files the query works fine and gives the results 
properly.

If the /testdata have some other format files let's say /testdata/test.txt the 
query is failing with the following error.



java.io.IOException: java.lang.reflect.InvocationTargetException at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
 at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:341)
 at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
 at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)
 at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200) 
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at 
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at 
org.apache.hadoop.mapred.Child$4.run(Child.java:270) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:415) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126)
 at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: 
java.lang.reflect.InvocationTargetException at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:327)
 ... 11 more Caused by: java.io.IOException: Not a data file. at 
org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at 
org.apache.avro.file.DataFileReader.(DataFileReader.java:97) at 
org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.(AvroGenericRecordReader.java:72)
 at 
org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51)
 at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:65)
 ... 16 more





Can anyone suggest any parameter or any changes needs to be made for the query 
to be successful. Basically Hive should skip the other format files and load 
only the avro files when processing data on the HDFS.



Waiting for any suggestions to resolve this issue.



Regards

Sathish Valluri



Any sugesstions java.io.IOException: Not a data file error

2013-10-30 Thread Valluri, Sathish
Hi All,

 

Hive Mapreduce jobs failing with the following java.io.IOException: Not a
data file error if there are files other than avro in the HDFS.

I have created a Hive external table as shown below,

 

CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES
('avro.schema.literal'='{ ') STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION
'/testdata/';

 

Running select count(*) from testable;

 

When /testdata contains avro files the query works fine and gives the
results properly.

If the /testdata have some other format files let's say /testdata/test.txt
the query is failing with the following error.

 

java.io.IOException: java.lang.reflect.InvocationTargetException at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCrea
tionException(HiveIOExceptionHandlerChain.java:97) at
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreat
ionException(HiveIOExceptionHandlerUtil.java:57) at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initN
extRecordReader(HadoopShimsSecure.java:341) at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(
HadoopShimsSecure.java:220) at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java
:215) at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at
org.apache.hadoop.mapred.Child$4.run(Child.java:270) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
va:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by:
java.lang.reflect.InvocationTargetException at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces
sorImpl.java:57) at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc
torAccessorImpl.java:45) at
java.lang.reflect.Constructor.newInstance(Constructor.java:525) at
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initN
extRecordReader(HadoopShimsSecure.java:327) ... 11 more Caused by:
java.io.IOException: Not a data file. at
org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at
org.apache.avro.file.DataFileReader.(DataFileReader.java:97) at
org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.(AvroGeneric
RecordReader.java:72) at
org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(A
vroContainerInputFormat.java:51) at
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecor
dReader.java:65) ... 16 more

 

 

Can anyone suggest any parameter or any changes needs to be made for the
query to be successful. Basically Hive should skip the other format files
and load only the avro files when processing data on the HDFS.

 

Waiting for any suggestions to resolve this issue.

 

Regards

Sathish Valluri



smime.p7s
Description: S/MIME cryptographic signature


kerberos Hive Server

2013-09-05 Thread Valluri, Sathish
Hi,



I am setting Kerberos authentication on Hive Server2. Using hive 0.11 from mapr 
distribution.

After setting the following configurations, when trying to start hive server2, 
it failing with the following error

13/09/05 08:59:10 INFO service.AbstractService: Service:HiveServer2 is started.

javax.security.auth.login.LoginException: Kerberos principal should have 3 
parts: root

at 
org.apache.hive.service.auth.HiveAuthFactory.getAuthTransFactory(HiveAuthFactory.java:82)

at 
org.apache.hive.service.cli.thrift.ThriftCLIService.run(ThriftCLIService.java:403)

at java.lang.Thread.run(Thread.java:722)





I have generated the principal and keytab files and giving the same principal 
name, although it has proper 3 parts still it's giving this error.

Can anybody know how to resolve this issue or any suggestions.





Hive Server hive-site.xml file







   hive.server2.authentication

KERBEROS

Authentication type 







   hive.server2.authentication.kerberos.principal

root/_h...@example.com

The service principal for the HiveServer2. If _HOST is

   used as the hostname portion, it will be replaced with the actual

   hostname of the running instance.







hive.server2.authentication.kerberos.keytab

/opt/mapr/hive/hive-0.11/conf/hive.keytab

The keytab for the HiveServer2 service principal









Regards

Sathish Valluri



RE: hive cli escaping TAB and NEW LINE Characters.

2013-05-06 Thread Valluri, Sathish
This is the idea which I have thought, But in our scenario we have less control 
on writing avro data with delimited TABS and NEWLINES.(encoding tabs and 
newlines with other characters).

Since avro data can be pumped on to the Warehouse system from many sources and 
if we have to implement this kind of logic we have handle this TABS and 
NEWLINES encoding on every data writing part.

Interested if this can be handled without delimiting avro data, like reading 
the AVRO data and transforming into other encoding format and sending to the 
cli output in this format.

And our app will decode the data and display.



Regards

Sathish Valluri



From: Sanjay Subramanian [mailto:sanjay.subraman...@wizecommerce.com]
Sent: Saturday, May 04, 2013 12:08 AM
To: user@hive.apache.org
Subject: Re: hive cli escaping TAB and NEW LINE Characters.



+1 to Stephens suggestion...



From: Stephen Sprague mailto:sprag...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Friday, May 3, 2013 11:29 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Re: hive cli escaping TAB and NEW LINE Characters.



hate to sound like a broken record but when all else fails think about the 
transform() function. The notion here is of encoding your tabs and newlines to 
something like '\t' and '\n' (literally) for instance. If those aren't unique 
enough use '<>' and "<>' (you get the idea)  then having your app 
decode those strings to real tabs and real newlines when reading it.

What do you think?









On Fri, May 3, 2013 at 2:07 AM, Valluri, Sathish 
mailto:sathish.vall...@emc.com>> wrote:

Hi All,



We have an application which parses hive cli output and displays results.

I have an external table with data in avro format, the contents in this avro 
file have TAB and NEW LINES in the Avro data part.

Since hive cli output rows are delimited by NEWLINES and columns are delimited 
by TABS, if the actual content have TABS and NEW LINE characters parsing the 
result set is giving wrong results.

Can anyone suggest some ideas regarding delimiting the TABS and NEW LINE 
characters in the hive cli output if the actual contents of the columns have 
TABS and NEW LINES.



Regards

Sathish Valluri





CONFIDENTIALITY NOTICE
==
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.



hive cli escaping TAB and NEW LINE Characters.

2013-05-03 Thread Valluri, Sathish
Hi All,



We have an application which parses hive cli output and displays results.

I have an external table with data in avro format, the contents in this avro 
file have TAB and NEW LINES in the Avro data part.

Since hive cli output rows are delimited by NEWLINES and columns are delimited 
by TABS, if the actual content have TABS and NEW LINE characters parsing the 
result set is giving wrong results.

Can anyone suggest some ideas regarding delimiting the TABS and NEW LINE 
characters in the hive cli output if the actual contents of the columns have 
TABS and NEW LINES.



Regards

Sathish Valluri