Re: [ANNOUNCE] Apache Hive 1.2.0 Released

2015-05-19 Thread Philippe Kernévez
Great thanks to the team, the release notes is impressive...



On Mon, May 18, 2015 at 11:25 PM, Sushanth Sowmyan khorg...@apache.org
wrote:


 The Apache Hive team is proud to announce the the release of Apache Hive
 version 1.2.0.

 The Apache Hive (TM) data warehouse software facilitates querying and
 managing large datasets residing in distributed storage. Built on top of
 Apache Hadoop (TM), it provides:

 * Tools to enable easy data extract/transform/load (ETL)

 * A mechanism to impose structure on a variety of data formats

 * Access to files stored either directly in Apache HDFS (TM) or in other
 data storage systems such as Apache HBase (TM)

 * Query execution via Apache Hadoop MapReduce, Apache Tez or Apache Spark
 frameworks.

 For Hive release details and downloads, please visit:
 https://hive.apache.org/downloads.html

 Hive 1.2.0 Release Notes are available here:
 https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12329345styleName=TextprojectId=12310843

 We would like to thank the many contributors who made this release
 possible.

 Regards,

 The Apache Hive Team




-- 
Philippe Kernévez



Directeur technique (Suisse),
pkerne...@octo.com
+41 79 888 33 32

Retrouvez OCTO sur OCTO Talk : http://blog.octo.com
OCTO Technology http://www.octo.com


Re: Order of Partition column and Non Partition column in the WHERE clause

2015-05-19 Thread Gopal Vijayaraghavan
 Would the order of partition column in the where clause matter for
performance?

No, unless you have more complex predicates than an AND.

There¹s one recent regression though -
https://issues.apache.org/jira/browse/HIVE-10122

Which release are you on?

 Also, how can I make sure that ³partition pruning² is working as
intended when checking the execution plan?
 

explain extended query

shows all the partitions being read via the Path - Partition section.

Cheers,
Gopal




Hive on Spark VS Spark SQL

2015-05-19 Thread guoqing0...@yahoo.com.hk
Hive on Spark and SparkSQL which should be better , and what are the key 
characteristics and the advantages and the disadvantages between ?



guoqing0...@yahoo.com.hk


javax.jdo.JDOFatalInternalException: Invalid index 1 for DataStoreMapping.

2015-05-19 Thread Han-Cheol Cho
Hi, 
I am wondering there is someone who encountered the same (or similar) problem 
while usingHive and looking for a solution it.
I am running a daily batch to import data from mysql to hive by using beeline 
and HS2.Yesterday, the job failed with errors that I havn't seen before.The 
following is an excerpt from the HS2 log:...2015-05-19 01:13:52,483 INFO  
exec.Task (SessionState.java:printInfo(417)) - Loading data to table 
kago_comicoshop.addresses partition (dt=2015-05-18) from 
hdfs://mycluster/tmp/hive-hive/hive_2015-05-19_01-13-36_529_2684279023796649469-3/-ext-1
2015-05-19 01:13:52,746 ERROR metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invoke(157)) - Retrying HMSHandler after 1000 ms 
(attempt 1 of 1) with error: javax.jdo.JDOFatalInternalException: Invalid index 
1 for DataStoreMapping.at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:591)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
at 
org.apache.hadoop.hive.metastore.ObjectStore.addPartition(ObjectStore.java:1196)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)   
 at com.sun.proxy.$Proxy12.addPartition(Unknown Source)at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.append_partition_common(HiveMetaStore.java:1547)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.append_partition_with_environment_context(HiveMetaStore.java:1602)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at 
com.sun.proxy.$Proxy13.append_partition_with_environment_context(Unknown 
Source)at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(HiveMetaStoreClient.java:424)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(HiveMetaStoreClient.java:418)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy16.appendPartition(Unknown Source)at 
org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1612)at 
org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1233)at 
org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:409)at 
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) 
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1485)at 
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1263)at 
org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1091)at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)at 
org.apache.hadoop.hive.ql.Driver.run(Driver.java:926)   
...NestedThrowablesStackTrace:Invalid index 1 for 
DataStoreMapping.org.datanucleus.exceptions.NucleusException: Invalid index 1 
for DataStoreMapping.at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:591)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
 ...
When I restarted HS2 and rerun the batch, it went well without any problem.

Best wishes,Han-Cheol
 趙漢哲  CHO, HAN-CHEOL (Ph.D) 
 データ研究室   / 社員 
  -- 〒105-6322 東京都港区虎ノ門1-23-1 虎ノ門ヒルズ森タワー22階
Email  hancheol@nhn-playart.com   Messenger   

NHN PlayArt 株式会社
 

Order of Partition column and Non Partition column in the WHERE clause

2015-05-19 Thread reveen joe
Hello,



Would the order of partition column in the where clause matter for
performance?



For eg: would there be any difference in performance in the below queries?



select a from table where part_column = ‘y’  and non_part_column = ‘z’



or



select a from table where non_part_column = ‘z’ and part_column = ‘y’



Also, how can I make sure that “partition pruning” is working as intended
when checking the execution plan?



Thanks in advance.


Hive timeout while loading hashtable file?

2015-05-19 Thread Frank Luo
I got a pretty straight forward multi-table join that constantly time out on 
300 secs limit without any other error. The last several lines in the log are 
as below, any hint what went wrong? From the log, it looks out failing on 
loading hashtable file from tmp file.

19 12:36:37,332 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 
Initializing Self RS[5]
2015-05-19 12:36:37,332 INFO [main] 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Using tag = -1
2015-05-19 12:36:37,334 INFO [main] 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Operator 5 RS initialized
2015-05-19 12:36:37,334 INFO [main] 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Initialization Done 5 RS
2015-05-19 12:36:37,334 INFO [main] 
org.apache.hadoop.hive.ql.exec.GroupByOperator: Initialization Done 4 GBY
2015-05-19 12:36:37,334 INFO [main] 
org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done 3 SEL
2015-05-19 12:36:37,336 INFO [main] 
org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: 
__HASH_MAP_MAPJOIN_42_container
2015-05-19 12:36:37,336 INFO [main] 
org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: 
__HASH_MAP_MAPJOIN_42_serde
2015-05-19 12:36:37,336 INFO [main] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: Try to retrieve from cache
2015-05-19 12:36:37,336 INFO [main] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: Did not find tables in cache
2015-05-19 12:36:37,337 INFO [main] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: Initialization Done 2 MAPJOIN
2015-05-19 12:36:37,337 INFO [main] 
org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: 
__HASH_MAP_MAPJOIN_43_container
2015-05-19 12:36:37,337 INFO [main] 
org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring retrieval request: 
__HASH_MAP_MAPJOIN_43_serde
2015-05-19 12:36:37,337 INFO [main] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: Try to retrieve from cache
2015-05-19 12:36:37,337 INFO [main] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: Did not find tables in cache
2015-05-19 12:36:37,337 INFO [main] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: Initialization Done 1 MAPJOIN
2015-05-19 12:36:37,337 INFO [main] 
org.apache.hadoop.hive.ql.exec.HashTableDummyOperator: Initialization Done 7 
HASHTABLEDUMMY
2015-05-19 12:36:37,342 INFO [main] org.apache.hadoop.hive.ql.log.PerfLogger: 
PERFLOG method=LoadHashtable 
from=org.apache.hadoop.hive.ql.exec.MapJoinOperator
2015-05-19 12:36:37,342 INFO [main] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: *** Load from HashTable for 
input file: 
hdfs://nameservice1/tmp/hive/bthomson/7cfb6499-04a0-4d96-a0fc-5001e3cdb413/hive_2015-05-19_12-24-11_656_259620018843960962-1/-mr-10005/02_0
2015-05-19 12:36:37,343 INFO [main] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator:  Load back 1 hashtable file 
from tmp file 
uri:file:/data/2/hadoop/yarn/local/usercache/bthomson/appcache/application_1430337284339_2029/container_1430337284339_2029_01_03/Stage-5.tar.gz/MapJoin-mapfile150--.hashtable
2015-05-19 12:36:56,925 INFO [main] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: This is not bucket map join, so 
cache
2015-05-19 12:36:56,925 INFO [main] 
org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring cache key: 
__HASH_MAP_MAPJOIN_43_container
2015-05-19 12:36:56,925 INFO [main] 
org.apache.hadoop.hive.ql.exec.mr.ObjectCache: Ignoring cache key: 
__HASH_MAP_MAPJOIN_43_serde
2015-05-19 12:36:56,925 INFO [main] org.apache.hadoop.hive.ql.log.PerfLogger: 
/PERFLOG method=LoadHashtable start=1432053397342 end=1432053416925 
duration=19583 from=org.apache.hadoop.hive.ql.exec.MapJoinOperator
2015-05-19 12:36:56,926 INFO [main] org.apache.hadoop.hive.ql.log.PerfLogger: 
PERFLOG method=LoadHashtable 
from=org.apache.hadoop.hive.ql.exec.MapJoinOperator
2015-05-19 12:36:56,926 INFO [main] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator: *** Load from HashTable for 
input file: 
hdfs://nameservice1/tmp/hive/bthomson/7cfb6499-04a0-4d96-a0fc-5001e3cdb413/hive_2015-05-19_12-24-11_656_259620018843960962-1/-mr-10005/02_0
2015-05-19 12:36:56,926 INFO [main] 
org.apache.hadoop.hive.ql.exec.MapJoinOperator:  Load back 1 hashtable file 
from tmp file 
uri:file:/data/2/hadoop/yarn/local/usercache/bthomson/appcache/application_1430337284339_2029/container_1430337284339_2029_01_03/Stage-5.tar.gz/MapJoin-mapfile141--.hashtable



Re: Output of Hive

2015-05-19 Thread Abe Weinograd
Your WHERE clause is returning 0 rows to the query.  Either the filter
needs to be tweaked OR there is something wrong with your table.

Try doing a count on the table without filters to see if that works and
then maybe add filters in one by one to see where you lose results.

Abe

On Sat, May 16, 2015 at 7:40 AM, Anand Murali anand_vi...@yahoo.com wrote:

 Dear All:

 I am new to hive so pardon my ignorance. I have the following query but do
 not see any output. I wondered it maybe in HDFS and checked there and do
 not find it there. Can somebody advise

 hive select year, MAX(Temperature) from records where temperature  
 and (quality = 0 or quality = 1 or quality = 4 or quality = 5 or quality =
 9)
  group by year
  ;
 Query ID = anand_vihar_20150516170505_9b23d8ba-19d7-4fa7-b972-4f199e3bf56a
 Total jobs = 1
 Launching Job 1 out of 1
 Number of reduce tasks not specified. Estimated from input data size: 1
 In order to change the average load for a reducer (in bytes):
   set hive.exec.reducers.bytes.per.reducer=number
 In order to limit the maximum number of reducers:
   set hive.exec.reducers.max=number
 In order to set a constant number of reducers:
   set mapreduce.job.reduces=number
 Job running in-process (local Hadoop)
 2015-05-16 17:05:11,504 Stage-1 map = 100%,  reduce = 100%
 Ended Job = job_local927727978_0003
 MapReduce Jobs Launched:
 Stage-Stage-1:  HDFS Read: 5329140 HDFS Write: 0 SUCCESS
 Total MapReduce CPU Time Spent: 0 msec
 OK
 Time taken: 1.258 seconds

 Thanks

 Anand Murali