Hive unwanted directories creation issue
We are creating external table in Hive and if the location path is not present in the HDFS say /testdata(as shown below), Hive is creating the '/testdata' dummy folder. Is there any option in hive or any way to stop creating dummy directories if the location folder not exists. So we end up creating many unwanted dummy directories if the data not present on the HDFS for many partitions we add after creating table. CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{ ') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/testdata/'; Regards Sathish Valluri
Hive Avro union data access
Hi, I have an Hive table created with 3 different union data types for alias_host column name as shown. (array,string, null). CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type": [{ "type" : "array", "items" : "string" },"string","null"]}]} ') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/arrayTests'; How to access and query the contents of this table in where clause. The queries below like these can be possible if the datatype is not union but when once I set the datatype as union the following queries are failing. Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 1000; Error: Error while processing statement: FAILED: SemanticException [Error 10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of EQUAL is expected to a primitive type, but union is found (state=42000,code=10016) Can anyone suggest how to access and query the contents of union data types. Regards Sathish Valluri smime.p7s Description: S/MIME cryptographic signature
Hive Avro union data access
Hi, I have an Hive table created with 3 different union data types for alias_host column name as shown. (array,string, null). CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type": [{ "type" : "array", "items" : "string" },"string","null"]}]} ') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/arrayTests'; How to access and query the contents of this table in where clause. The queries below like these can be possible if the datatype is not union but when once I set the datatype as union the following queries are failing. Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 1000; Error: Error while processing statement: FAILED: SemanticException [Error 10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of EQUAL is expected to a primitive type, but union is found (state=42000,code=10016) Can anyone suggest how to access and query the contents of union data types. Regards Sathish Valluri smime.p7s Description: S/MIME cryptographic signature
Hive Avro union data access
Hi, I have an Hive table created with 3 different union data types for alias_host column name as shown. (array,string, null). CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type": [{ "type" : "array", "items" : "string" },"string","null"]}]} ') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/arrayTests'; How to access and query the contents of this table in where clause. The queries below like these can be possible if the datatype is not union but when once I set the datatype as union the following queries are failing. Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 1000; Error: Error while processing statement: FAILED: SemanticException [Error 10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of EQUAL is expected to a primitive type, but union is found (state=42000,code=10016) Can anyone suggest how to access and query the contents of union data types. Regards Sathish Valluri smime.p7s Description: S/MIME cryptographic signature
Hive Avro union data access
Hi, I have an Hive table created with 3 different union data types for alias_host column name as shown. (array,string, null). CREATE EXTERNAL TABLE array_tests ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"alias_host","type": [{ "type" : "array", "items" : "string" },"string","null"]}]} ') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/arrayTests'; How to access and query the contents of this table in where clause. The queries below like these can be possible if the datatype is not union but when once I set the datatype as union the following queries are failing. Eg: select alias_host from array_tests where alias_host like ‘%test%’ limit 1000; Error: Error while processing statement: FAILED: SemanticException [Error 10016]: Line 1:32 Argument type mismatch 'alias_host': The 1st argument of EQUAL is expected to a primitive type, but union is found (state=42000,code=10016) Can anyone suggest how to access and query the contents of union data types. Regards Sathish Valluri
Hive-exec package JDBC issue
Hi all, We are trying to use Hive 0.12 JDBC driver for connecting to remote Hive Server and execute queries. We found a strange issue with this driver that it has dependency on hive-exec package and hive-exec has internally packaged the com.google.common classes into this.(please check the hive-exec_contents.png attached for details) Our application uses com.google.guava package which is also using the latest com.google.common as shown in the guava_contents.png file) for other purposes. Can anyone suggest how to remove this com.google.x dependency from hive-exec and also anyone knows what is the reason for this kind of google packages dependencies on hive-exec. Right now our application is not starting because of this conflict and it's loading older com.google.common from hive-exec and failing method not found errors. Regards Sathish Valluri <><>
Hive unwanted location directory
We are creating external table in Hive and if the location path is not present in the HDFS say /testdata(as shown below), Hive is creating the '/testdata' dummy folder. Is there any option in hive or any way to stop creating dummy directories if the location folder not exists. Our use case needs many temporary tables needs to be created dynamically and we are creating many unwanted dummy directories if the data not present on the HDFS. CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{ ') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/testdata/'; Regards Sathish Valluri
Create Table in Hive always throwing the error
When trying to create table in Hive I am always getting the following error always. Not able to do any hive DDL operations, FAILED: Error in metadata: MetaException(message:java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that there are unbalanced calls to openTransaction/commitTransaction) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask Can anyone help or give any idea on the reason for this failure always. I have looked into the similar issue reported on Hive https://issues.apache.org/jira/browse/HIVE-4996 but this is looks like observed intermittently and not always. Regards Sathish Valluri
Any sugesstions java.io.IOException: Not a data file error
Resending after disabling security signing.. From: Valluri, Sathish [mailto:sathish.vall...@emc.com] Sent: Wednesday, October 30, 2013 2:17 PM To: user@hive.apache.org Subject: Any sugesstions java.io.IOException: Not a data file error Hi All, Hive Mapreduce jobs failing with the following java.io.IOException: Not a data file error if there are files other than avro in the HDFS. I have created a Hive external table as shown below, CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{ ') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/testdata/'; Running select count(*) from testable; When /testdata contains avro files the query works fine and gives the results properly. If the /testdata have some other format files let's say /testdata/test.txt the query is failing with the following error. java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:341) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:327) ... 11 more Caused by: java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileReader.(DataFileReader.java:97) at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.(AvroGenericRecordReader.java:72) at org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:65) ... 16 more Can anyone suggest any parameter or any changes needs to be made for the query to be successful. Basically Hive should skip the other format files and load only the avro files when processing data on the HDFS. Waiting for any suggestions to resolve this issue. Regards Sathish Valluri
Any sugesstions java.io.IOException: Not a data file error
Hi All, Hive Mapreduce jobs failing with the following java.io.IOException: Not a data file error if there are files other than avro in the HDFS. I have created a Hive external table as shown below, CREATE EXTERNAL TABLE testTable ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{ ') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/testdata/'; Running select count(*) from testable; When /testdata contains avro files the query works fine and gives the results properly. If the /testdata have some other format files let's say /testdata/test.txt the query is failing with the following error. java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCrea tionException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreat ionException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initN extRecordReader(HadoopShimsSecure.java:341) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next( HadoopShimsSecure.java:220) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java :215) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:405) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja va:1126) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAcces sorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstruc torAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initN extRecordReader(HadoopShimsSecure.java:327) ... 11 more Caused by: java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileReader.(DataFileReader.java:97) at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.(AvroGeneric RecordReader.java:72) at org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(A vroContainerInputFormat.java:51) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecor dReader.java:65) ... 16 more Can anyone suggest any parameter or any changes needs to be made for the query to be successful. Basically Hive should skip the other format files and load only the avro files when processing data on the HDFS. Waiting for any suggestions to resolve this issue. Regards Sathish Valluri smime.p7s Description: S/MIME cryptographic signature
kerberos Hive Server
Hi, I am setting Kerberos authentication on Hive Server2. Using hive 0.11 from mapr distribution. After setting the following configurations, when trying to start hive server2, it failing with the following error 13/09/05 08:59:10 INFO service.AbstractService: Service:HiveServer2 is started. javax.security.auth.login.LoginException: Kerberos principal should have 3 parts: root at org.apache.hive.service.auth.HiveAuthFactory.getAuthTransFactory(HiveAuthFactory.java:82) at org.apache.hive.service.cli.thrift.ThriftCLIService.run(ThriftCLIService.java:403) at java.lang.Thread.run(Thread.java:722) I have generated the principal and keytab files and giving the same principal name, although it has proper 3 parts still it's giving this error. Can anybody know how to resolve this issue or any suggestions. Hive Server hive-site.xml file hive.server2.authentication KERBEROS Authentication type hive.server2.authentication.kerberos.principal root/_h...@example.com The service principal for the HiveServer2. If _HOST is used as the hostname portion, it will be replaced with the actual hostname of the running instance. hive.server2.authentication.kerberos.keytab /opt/mapr/hive/hive-0.11/conf/hive.keytab The keytab for the HiveServer2 service principal Regards Sathish Valluri
RE: hive cli escaping TAB and NEW LINE Characters.
This is the idea which I have thought, But in our scenario we have less control on writing avro data with delimited TABS and NEWLINES.(encoding tabs and newlines with other characters). Since avro data can be pumped on to the Warehouse system from many sources and if we have to implement this kind of logic we have handle this TABS and NEWLINES encoding on every data writing part. Interested if this can be handled without delimiting avro data, like reading the AVRO data and transforming into other encoding format and sending to the cli output in this format. And our app will decode the data and display. Regards Sathish Valluri From: Sanjay Subramanian [mailto:sanjay.subraman...@wizecommerce.com] Sent: Saturday, May 04, 2013 12:08 AM To: user@hive.apache.org Subject: Re: hive cli escaping TAB and NEW LINE Characters. +1 to Stephens suggestion... From: Stephen Sprague mailto:sprag...@gmail.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Date: Friday, May 3, 2013 11:29 AM To: "user@hive.apache.org<mailto:user@hive.apache.org>" mailto:user@hive.apache.org>> Subject: Re: hive cli escaping TAB and NEW LINE Characters. hate to sound like a broken record but when all else fails think about the transform() function. The notion here is of encoding your tabs and newlines to something like '\t' and '\n' (literally) for instance. If those aren't unique enough use '<>' and "<>' (you get the idea) then having your app decode those strings to real tabs and real newlines when reading it. What do you think? On Fri, May 3, 2013 at 2:07 AM, Valluri, Sathish mailto:sathish.vall...@emc.com>> wrote: Hi All, We have an application which parses hive cli output and displays results. I have an external table with data in avro format, the contents in this avro file have TAB and NEW LINES in the Avro data part. Since hive cli output rows are delimited by NEWLINES and columns are delimited by TABS, if the actual content have TABS and NEW LINE characters parsing the result set is giving wrong results. Can anyone suggest some ideas regarding delimiting the TABS and NEW LINE characters in the hive cli output if the actual contents of the columns have TABS and NEW LINES. Regards Sathish Valluri CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.
hive cli escaping TAB and NEW LINE Characters.
Hi All, We have an application which parses hive cli output and displays results. I have an external table with data in avro format, the contents in this avro file have TAB and NEW LINES in the Avro data part. Since hive cli output rows are delimited by NEWLINES and columns are delimited by TABS, if the actual content have TABS and NEW LINE characters parsing the result set is giving wrong results. Can anyone suggest some ideas regarding delimiting the TABS and NEW LINE characters in the hive cli output if the actual contents of the columns have TABS and NEW LINES. Regards Sathish Valluri