Re: [ANNOUNCE] Apache Hive 1.2.0 Released
Great thanks to the team, the release notes is impressive... On Mon, May 18, 2015 at 11:25 PM, Sushanth Sowmyan khorg...@apache.org wrote: The Apache Hive team is proud to announce the the release of Apache Hive version 1.2.0. The Apache Hive (TM) data warehouse software facilitates querying and managing large datasets residing in distributed storage. Built on top of Apache Hadoop (TM), it provides: * Tools to enable easy data extract/transform/load (ETL) * A mechanism to impose structure on a variety of data formats * Access to files stored either directly in Apache HDFS (TM) or in other data storage systems such as Apache HBase (TM) * Query execution via Apache Hadoop MapReduce, Apache Tez or Apache Spark frameworks. For Hive release details and downloads, please visit: https://hive.apache.org/downloads.html Hive 1.2.0 Release Notes are available here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329345styleName=TextprojectId=12310843 We would like to thank the many contributors who made this release possible. Regards, The Apache Hive Team -- Philippe Kernévez Directeur technique (Suisse), pkerne...@octo.com +41 79 888 33 32 Retrouvez OCTO sur OCTO Talk : http://blog.octo.com OCTO Technology http://www.octo.com
Re: Order of Partition column and Non Partition column in the WHERE clause
Would the order of partition column in the where clause matter for performance? No, unless you have more complex predicates than an AND. There¹s one recent regression though - https://issues.apache.org/jira/browse/HIVE-10122 Which release are you on? Also, how can I make sure that ³partition pruning² is working as intended when checking the execution plan? explain extended query shows all the partitions being read via the Path - Partition section. Cheers, Gopal
Hive on Spark VS Spark SQL
Hive on Spark and SparkSQL which should be better , and what are the key characteristics and the advantages and the disadvantages between ? guoqing0...@yahoo.com.hk
javax.jdo.JDOFatalInternalException: Invalid index 1 for DataStoreMapping.
Hi, I am wondering there is someone who encountered the same (or similar) problem while usingHive and looking for a solution it. I am running a daily batch to import data from mysql to hive by using beeline and HS2.Yesterday, the job failed with errors that I havn't seen before.The following is an excerpt from the HS2 log:...2015-05-19 01:13:52,483 INFO exec.Task (SessionState.java:printInfo(417)) - Loading data to table kago_comicoshop.addresses partition (dt=2015-05-18) from hdfs://mycluster/tmp/hive-hive/hive_2015-05-19_01-13-36_529_2684279023796649469-3/-ext-1 2015-05-19 01:13:52,746 ERROR metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(157)) - Retrying HMSHandler after 1000 ms (attempt 1 of 1) with error: javax.jdo.JDOFatalInternalException: Invalid index 1 for DataStoreMapping.at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:591) at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732) at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752) at org.apache.hadoop.hive.metastore.ObjectStore.addPartition(ObjectStore.java:1196) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108) at com.sun.proxy.$Proxy12.addPartition(Unknown Source)at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.append_partition_common(HiveMetaStore.java:1547) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.append_partition_with_environment_context(HiveMetaStore.java:1602) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) at com.sun.proxy.$Proxy13.append_partition_with_environment_context(Unknown Source)at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(HiveMetaStoreClient.java:424) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(HiveMetaStoreClient.java:418) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) at com.sun.proxy.$Proxy16.appendPartition(Unknown Source)at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1612)at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1233)at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:409)at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1485)at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1263)at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1091)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:926) ...NestedThrowablesStackTrace:Invalid index 1 for DataStoreMapping.org.datanucleus.exceptions.NucleusException: Invalid index 1 for DataStoreMapping.at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:591) at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732) at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752) ... When I restarted HS2 and rerun the batch, it went well without any problem. Best wishes,Han-Cheol 趙漢哲 CHO, HAN-CHEOL (Ph.D) データ研究室 / 社員 -- 〒105-6322 東京都港区虎ノ門1-23-1 虎ノ門ヒルズ森タワー22階 Email hancheol@nhn-playart.com Messenger NHN PlayArt 株式会社
Order of Partition column and Non Partition column in the WHERE clause
Hello, Would the order of partition column in the where clause matter for performance? For eg: would there be any difference in performance in the below queries? select a from table where part_column = ‘y’ and non_part_column = ‘z’ or select a from table where non_part_column = ‘z’ and part_column = ‘y’ Also, how can I make sure that “partition pruning” is working as intended when checking the execution plan? Thanks in advance.
Hive timeout while loading hashtable file?
I got a pretty straight forward multi-table join that constantly time out on 300 secs limit without any other error. The last several lines in the log are as below, any hint what went wrong? From the log, it looks out failing on loading hashtable file from tmp file. 19 12:36:37,332 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Initializing Self RS[5] 2015-05-19 12:36:37,332 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Using tag = -1 2015-05-19 12:36:37,334 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Operator 5 RS initialized 2015-05-19 12:36:37,334 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Initialization Done 5 RS 2015-05-19 12:36:37,334 INFO [main] org.apache.hadoop.hive.ql.exec.GroupByOperator: Initialization Done 4 GBY 2015-05-19 12:36:37,334 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done 3 SEL 2015-05-19 12:36:37,336 INFO [main] org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: __HASH_MAP_MAPJOIN_42_container 2015-05-19 12:36:37,336 INFO [main] org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: __HASH_MAP_MAPJOIN_42_serde 2015-05-19 12:36:37,336 INFO [main] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Try to retrieve from cache 2015-05-19 12:36:37,336 INFO [main] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Did not find tables in cache 2015-05-19 12:36:37,337 INFO [main] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Initialization Done 2 MAPJOIN 2015-05-19 12:36:37,337 INFO [main] org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: __HASH_MAP_MAPJOIN_43_container 2015-05-19 12:36:37,337 INFO [main] org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: __HASH_MAP_MAPJOIN_43_serde 2015-05-19 12:36:37,337 INFO [main] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Try to retrieve from cache 2015-05-19 12:36:37,337 INFO [main] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Did not find tables in cache 2015-05-19 12:36:37,337 INFO [main] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Initialization Done 1 MAPJOIN 2015-05-19 12:36:37,337 INFO [main] org.apache.hadoop.hive.ql.exec.HashTableDummyOperator: Initialization Done 7 HASHTABLEDUMMY 2015-05-19 12:36:37,342 INFO [main] org.apache.hadoop.hive.ql.log.PerfLogger: PERFLOG method=LoadHashtable from=org.apache.hadoop.hive.ql.exec.MapJoinOperator 2015-05-19 12:36:37,342 INFO [main] org.apache.hadoop.hive.ql.exec.MapJoinOperator: *** Load from HashTable for input file: hdfs://nameservice1/tmp/hive/bthomson/7cfb6499-04a0-4d96-a0fc-5001e3cdb413/hive_2015-05-19_12-24-11_656_259620018843960962-1/-mr-10005/02_0 2015-05-19 12:36:37,343 INFO [main] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load back 1 hashtable file from tmp file uri:file:/data/2/hadoop/yarn/local/usercache/bthomson/appcache/application_1430337284339_2029/container_1430337284339_2029_01_03/Stage-5.tar.gz/MapJoin-mapfile150--.hashtable 2015-05-19 12:36:56,925 INFO [main] org.apache.hadoop.hive.ql.exec.MapJoinOperator: This is not bucket map join, so cache 2015-05-19 12:36:56,925 INFO [main] org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring cache key: __HASH_MAP_MAPJOIN_43_container 2015-05-19 12:36:56,925 INFO [main] org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring cache key: __HASH_MAP_MAPJOIN_43_serde 2015-05-19 12:36:56,925 INFO [main] org.apache.hadoop.hive.ql.log.PerfLogger: /PERFLOG method=LoadHashtable start=1432053397342 end=1432053416925 duration=19583 from=org.apache.hadoop.hive.ql.exec.MapJoinOperator 2015-05-19 12:36:56,926 INFO [main] org.apache.hadoop.hive.ql.log.PerfLogger: PERFLOG method=LoadHashtable from=org.apache.hadoop.hive.ql.exec.MapJoinOperator 2015-05-19 12:36:56,926 INFO [main] org.apache.hadoop.hive.ql.exec.MapJoinOperator: *** Load from HashTable for input file: hdfs://nameservice1/tmp/hive/bthomson/7cfb6499-04a0-4d96-a0fc-5001e3cdb413/hive_2015-05-19_12-24-11_656_259620018843960962-1/-mr-10005/02_0 2015-05-19 12:36:56,926 INFO [main] org.apache.hadoop.hive.ql.exec.MapJoinOperator: Load back 1 hashtable file from tmp file uri:file:/data/2/hadoop/yarn/local/usercache/bthomson/appcache/application_1430337284339_2029/container_1430337284339_2029_01_03/Stage-5.tar.gz/MapJoin-mapfile141--.hashtable
Re: Output of Hive
Your WHERE clause is returning 0 rows to the query. Either the filter needs to be tweaked OR there is something wrong with your table. Try doing a count on the table without filters to see if that works and then maybe add filters in one by one to see where you lose results. Abe On Sat, May 16, 2015 at 7:40 AM, Anand Murali anand_vi...@yahoo.com wrote: Dear All: I am new to hive so pardon my ignorance. I have the following query but do not see any output. I wondered it maybe in HDFS and checked there and do not find it there. Can somebody advise hive select year, MAX(Temperature) from records where temperature and (quality = 0 or quality = 1 or quality = 4 or quality = 5 or quality = 9) group by year ; Query ID = anand_vihar_20150516170505_9b23d8ba-19d7-4fa7-b972-4f199e3bf56a Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Job running in-process (local Hadoop) 2015-05-16 17:05:11,504 Stage-1 map = 100%, reduce = 100% Ended Job = job_local927727978_0003 MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 5329140 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 1.258 seconds Thanks Anand Murali