Bucketing is not leveraged in filter push down ?

2015-09-24 Thread Jeff Zhang
I have one table which is bucketed on column name. Then I have the
following sql:

- select count(1) from student_bucketed_2 where name = 'calvin
nixon';

Ideally I think it should only scan one bucket. But from the log I still
see it will scan all the bucket files. Why bucketing is not leveraged in
filter push down ?

Here's the log:

2015-09-24 14:59:22,282 INFO [TezChild] io.HiveContextAwareRecordReader:
Processing file hdfs://
0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/01_0
2015-09-24 14:59:22,282 INFO [TezChild] exec.Utilities: PLAN PATH = hdfs://
0.0.0.0:9000/tmp/hive/jzhang/1ddc9a11-28f8-45e5-aa12-aa457fc9530d/hive_2015-09-24_14-59-04_435_7446484965928171550-1/-mr-10004/b2224650-d31a-4599-8606-57681287bd41/map.xml
2015-09-24 14:59:22,289 INFO [TezChild] io.HiveContextAwareRecordReader:
Processing file hdfs://
0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/02_0
2015-09-24 14:59:22,289 INFO [TezChild] exec.Utilities: PLAN PATH = hdfs://
0.0.0.0:9000/tmp/hive/jzhang/1ddc9a11-28f8-45e5-aa12-aa457fc9530d/hive_2015-09-24_14-59-04_435_7446484965928171550-1/-mr-10004/b2224650-d31a-4599-8606-57681287bd41/map.xml
2015-09-24 14:59:22,296 INFO [TezChild] io.HiveContextAwareRecordReader:
Processing file hdfs://
0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/03_0
2015-09-24 14:59:22,296 INFO [TezChild] exec.Utilities: PLAN PATH = hdfs://
0.0.0.0:9000/tmp/hive/jzhang/1ddc9a11-28f8-45e5-aa12-aa457fc9530d/hive_2015-09-24_14-59-04_435_7446484965928171550-1/-mr-10004/b2224650-d31a-4599-8606-57681287bd41/map.xml
2015-09-24 14:59:22,304 INFO [TezChild] io.HiveContextAwareRecordReader:
Processing file hdfs://
0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/04_0
2015-09-24 14:59:22,304 INFO [TezChild] exec.Utilities: PLAN PATH = hdfs://
0.0.0.0:9000/tmp/hive/jzhang/1ddc9a11-28f8-45e5-aa12-aa457fc9530d/hive_2015-09-24_14-59-04_435_7446484965928171550-1/-mr-10004/b2224650-d31a-4599-8606-57681287bd41/map.xml
2015-09-24 14:59:22,311 INFO [TezChild] io.HiveContextAwareRecordReader:
Processing file hdfs://
0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/05_0


-- 
Best Regards

Jeff Zhang


exception

2015-09-24 Thread ram kumar
Hi,

1) created a table in hive,

hive -e "DROP TABLE login1;"
hive -e "CREATE EXTERNAL TABLE login1 (
 et STRING,
 uuid STRING,
 )
 ROW FORMAT SERDE 'com.proofpoint.hive.serde.JsonSerde'
 LOCATION 's3n://testing/test';"

creating table worked.


2) query,

can run SELECT * from login1;

but when trying SELECT COUNT(*) FROM login1;

got following error,

Diagnostic Messages for this Task:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:312)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:259)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:386)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:652)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:298)
... 11 more
Caused by: java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2076)
at 
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at 
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at 
org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:107)
at 
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:65)
... 16 more
Caused by: java.lang.ClassNotFoundException: Class
org.apache.hadoop.fs.s3native.NativeS3FileSystem not found
at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1982)
at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
... 26 more

added hadoop-aws.jar in hadoop-env.sh

hadoop classpath:
/usr/hdp/2.2.0.0-2041/hadoop/conf:/usr/hdp/2.2.0.0-2041/hadoop/lib/*:/usr/hdp/2.2.0.0-2041/hadoop/.//*:/usr/hdp/2.2.0.0-2041/hadoop-hdfs/./:/usr/hdp/2.2.0.0-2041/hadoop-hdfs/lib/*:/usr/hdp/2.2.0.0-2041/hadoop-hdfs/.//*:/usr/hdp/2.2.0.0-2041/hadoop-yarn/lib/*:/usr/hdp/2.2.0.0-2041/hadoop-yarn/.//*:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/lib/*:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/.//*::/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/hdp/current/hadoop-mapreduce-client/*:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/hadoop-aws.jar

can anyone help me with this,
Thanks


Re: hive ORC wrong number of index entries error

2015-09-24 Thread Patrick Duin
Thanks for the reply,
My first thought was out of memory as well but the illegal argument
exception happens before it is a separate entry in the log, The OOM
exception is not the cause. So I am not sure where that OOM exception fits
in. I've tried running it with more memory and got the same problem it was
also consistently failing on the same split.
We have about 650 columns. I don't know how many record writers are open
(how can I see that?).
I'll try running it with a reduced stripe size see if that helps.
The weird thing is we have a production cluster that is running same
hadoop/hive versions, same code and same data and processing just fine I
get this error only in our QA cluster.
It's hard to locate the difference :).
Anyway thanks for the pointers I'll do some more digging.

Cheers,
 Patrick

2015-09-24 0:51 GMT+01:00 Prasanth Jayachandran <
pjayachand...@hortonworks.com>:

> Looks like you are running out of memory. Trying increasing the heap
> memory or reducing the stripe size. How many columns are you writing? Any
> idea how many record writers are open per map task?
>
> - Prasanth
>
> On Sep 22, 2015, at 4:32 AM, Patrick Duin  wrote:
>
> Hi all,
>
> I am struggling trying to understand a stack trace I am getting trying to
> write an ORC file:
> I am using hive-0.13.0/hadoop-2.4.0.
>
> 2015-09-21 09:15:44,603 INFO [main] org.apache.hadoop.mapred.MapTask: 
> Ignoring exception during close for 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@2ce49e21
> java.lang.IllegalArgumentException: Column has wrong number of index entries 
> found: 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndexEntry$Builder@6eeb967b 
> expected: 1
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:578)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1398)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647)
>   at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> 2015-09-21 09:15:45,988 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
> Error running child : java.lang.OutOfMemoryError: Java heap space
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream.java:117)
>   at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168)
>   at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:583)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1012)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1400)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780)
>   at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>
> I've seen https://issues.apache.org/jira/browse/HIVE-9080 and I think that 
> might be related.
>
> I am not using hive though I am using a Map only job that writes to an 
> OrcNewOutputFormat.class.
>
> Any pointers would be appreciated, anyone seen this before?
>
>
> Thanks,
>
>  Patrick
>
>
>


Re: hive ORC wrong number of index entries error

2015-09-24 Thread Prasanth Jayachandran
With 650 columns you might need to reduce the compression buffer size to 8KB 
(may be try decreasing it fails or increasing it if it succeeds to find the 
right size) down from default 256KB. You can do that by setting 
orc.compress.size tblproperties.

On Sep 24, 2015, at 3:27 AM, Patrick Duin 
mailto:patd...@gmail.com>> wrote:

Thanks for the reply,
My first thought was out of memory as well but the illegal argument exception 
happens before it is a separate entry in the log, The OOM exception is not the 
cause. So I am not sure where that OOM exception fits in. I've tried running it 
with more memory and got the same problem it was also consistently failing on 
the same split.
We have about 650 columns. I don't know how many record writers are open (how 
can I see that?).
I'll try running it with a reduced stripe size see if that helps.
The weird thing is we have a production cluster that is running same 
hadoop/hive versions, same code and same data and processing just fine I get 
this error only in our QA cluster.
It's hard to locate the difference :).
Anyway thanks for the pointers I'll do some more digging.

Cheers,
 Patrick

2015-09-24 0:51 GMT+01:00 Prasanth Jayachandran 
mailto:pjayachand...@hortonworks.com>>:
Looks like you are running out of memory. Trying increasing the heap memory or 
reducing the stripe size. How many columns are you writing? Any idea how many 
record writers are open per map task?

- Prasanth

On Sep 22, 2015, at 4:32 AM, Patrick Duin 
mailto:patd...@gmail.com>> wrote:

Hi all,

I am struggling trying to understand a stack trace I am getting trying to write 
an ORC file:
I am using hive-0.13.0/hadoop-2.4.0.


2015-09-21 09:15:44,603 INFO [main] org.apache.hadoop.mapred.MapTask: Ignoring 
exception during close for 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@2ce49e21
java.lang.IllegalArgumentException: Column has wrong number of index entries 
found: org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndexEntry$Builder@6eeb967b 
expected: 1
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:578)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1398)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
at 
org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67)
at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
2015-09-21 09:15:45,988 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error 
running child : java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at 
org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream.java:117)
at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168)
at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:583)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1012)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1400)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780)
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
at 
org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67)
at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)


I've seen https://issues.apache.org/jira/browse/HIVE-9080 and I think that 
mig

Re: hive ORC wrong number of index entries error

2015-09-24 Thread Patrick Duin
cool thanks, will try

2015-09-24 9:32 GMT+01:00 Prasanth Jayachandran <
pjayachand...@hortonworks.com>:

> With 650 columns you might need to reduce the compression buffer size to
> 8KB (may be try decreasing it fails or increasing it if it succeeds to find
> the right size) down from default 256KB. You can do that by setting
> orc.compress.size tblproperties.
>
> On Sep 24, 2015, at 3:27 AM, Patrick Duin  wrote:
>
> Thanks for the reply,
> My first thought was out of memory as well but the illegal argument
> exception happens before it is a separate entry in the log, The OOM
> exception is not the cause. So I am not sure where that OOM exception fits
> in. I've tried running it with more memory and got the same problem it was
> also consistently failing on the same split.
> We have about 650 columns. I don't know how many record writers are open
> (how can I see that?).
> I'll try running it with a reduced stripe size see if that helps.
> The weird thing is we have a production cluster that is running same
> hadoop/hive versions, same code and same data and processing just fine I
> get this error only in our QA cluster.
> It's hard to locate the difference :).
> Anyway thanks for the pointers I'll do some more digging.
>
> Cheers,
>  Patrick
>
> 2015-09-24 0:51 GMT+01:00 Prasanth Jayachandran <
> pjayachand...@hortonworks.com>:
>
>> Looks like you are running out of memory. Trying increasing the heap
>> memory or reducing the stripe size. How many columns are you writing? Any
>> idea how many record writers are open per map task?
>>
>> - Prasanth
>>
>> On Sep 22, 2015, at 4:32 AM, Patrick Duin  wrote:
>>
>> Hi all,
>>
>> I am struggling trying to understand a stack trace I am getting trying to
>> write an ORC file:
>> I am using hive-0.13.0/hadoop-2.4.0.
>>
>> 2015-09-21 09:15:44,603 INFO [main] org.apache.hadoop.mapred.MapTask: 
>> Ignoring exception during close for 
>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector@2ce49e21
>> java.lang.IllegalArgumentException: Column has wrong number of index entries 
>> found: 
>> org.apache.hadoop.hive.ql.io.orc.OrcProto$RowIndexEntry$Builder@6eeb967b 
>> expected: 1
>>  at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:578)
>>  at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1398)
>>  at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780)
>>  at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
>>  at 
>> org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67)
>>  at 
>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647)
>>  at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
>>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
>>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>  at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> 2015-09-21 09:15:45,988 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
>> Error running child : java.lang.OutOfMemoryError: Java heap space
>>  at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>>  at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
>>  at 
>> org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream.java:117)
>>  at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168)
>>  at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239)
>>  at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:583)
>>  at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1012)
>>  at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1400)
>>  at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780)
>>  at 
>> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
>>  at 
>> org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67)
>>  at 
>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:647)
>>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
>>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>  at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at 
>> org.apache.hadoop.security.UserGroupInfor

Hive Query on Hbase snapshot error

2015-09-24 Thread ????????
Hi all,




I am using hive to query on base snapshot. But I got the following  error:

FAILED: IllegalArgumentException 
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
snapshot info 
from:/tmp/hbase-huser/hbase/.hbase-snapshot/test_table_snap0/.snapshotinfo

The following is the steps that I do:

1, on the hbase we already have an table: test_table. And I use command: 
snapshot ??test_table?? test_table_snap0??   to create a  snapshot??

2??on the hive create an external table like??

CREATE EXTERNAL TABLE `test_table_snapshot`(

  `id` int,

  `alias` string,

  `kdt_id` int,

  `created_time` string,

  `update_time` string,

  `title` string,

  `price` bigint,

  `goods_platform` int,

  `buy_url` string,

  `class1` int,

  `class2` string,

  `goods_type` tinyint,

  `sold_status` tinyint,

  `is_display` tinyint,

  `is_delete` tinyint,

  `num` bigint,

  `buy_way` boolean,

  `source` tinyint,

  `content` string,

  `picture` string,

  `is_virtual` tinyint)

ROW FORMAT SERDE

  'org.apache.hadoop.hive.hbase.HBaseSerDe'

STORED BY

  'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES (

  'field.delim'='\u0001',

  
'hbase.columns.mapping'=':key,cf1:alalis,cf1:kdt_id,cf1:created_time,cf1:update_time,cf1:title,cf1:price,cf1:goods_platform,cf1:buy_url,cf1:class1,cf1:class2,cf1:goods_type,cf1:sold_stattus,cf1:is_display,cf1:is_delete,cf1:num,cf1:buy_way,cf1:source,cf1:content,cf1:picture,cf1:is_virtual',

  'line.delim'='\n',

  'serialization.format'='\u0001')

TBLPROPERTIES (

  ??hbase.table.name'='test_table')




4, on Hive, execute ??select * from test_table_snapshot??,  then I get the 
above error.




I have set the ??hive.hbase.snapshot.restoredir?? value which is same as the 
root dir on hbase, and I also set the zookeep server. I checked the base root 
directory on hdfs and I can see the snapshot files. But when I use the above 
query on hive. it seems that the base.tmp.dir is used to find the snapshot 
info. BTW, if I directly query on hbase, there is no problem. I do not know 
what happened?




The following is the hive error log:



FAILED: SemanticException 
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
snapshot info 
from:/tmp/hbase-huser/hbase/.hbase-snapshot/goods_v3_hbase_snap0/.snapshotinfo

org.apache.hadoop.hive.ql.parse.SemanticException: 
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
snapshot info 
from:/tmp/hbase-huser/hbase/.hbase-snapshot/goods_v3_hbase_snap0/.snapshotinfo

at 
org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:117)

at 
org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:182)

at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10207)

at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)

at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)

at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)

at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)

at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)

at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)

at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)

at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)

at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)

at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)

at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)

at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)

at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.run(RunJar.java:221)

at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Caused by: java.lang.IllegalArgumentException: 
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
snapshot info 
from:/tmp/hbase-huser/hbase/.hbase-snapshot/goods_v3_hbase_snap0/.snapshotinfo

at 
org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureTableJobProperties(HBaseStorageHandler.java:401)

at 
org.apache.hadoop.hive.hbase.HBaseStorageHandler.configureInputJobProperties(HBaseStorageHandler.java:317)

at 
org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.j

Re: Bucketing is not leveraged in filter push down ?

2015-09-24 Thread matshyeq
There's JIRAs for that already:
HIVE-9523 
HIVE-11525 

Thank you,
Kind Regards
~Maciek

On Thu, Sep 24, 2015 at 8:29 AM, Jeff Zhang  wrote:

> I have one table which is bucketed on column name. Then I have the
> following sql:
>
> - select count(1) from student_bucketed_2 where name = 'calvin
> nixon';
>
> Ideally I think it should only scan one bucket. But from the log I still
> see it will scan all the bucket files. Why bucketing is not leveraged in
> filter push down ?
>
> Here's the log:
>
> 2015-09-24 14:59:22,282 INFO [TezChild] io.HiveContextAwareRecordReader:
> Processing file hdfs://
> 0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/01_0
> 2015-09-24 14:59:22,282 INFO [TezChild] exec.Utilities: PLAN PATH = hdfs://
> 0.0.0.0:9000/tmp/hive/jzhang/1ddc9a11-28f8-45e5-aa12-aa457fc9530d/hive_2015-09-24_14-59-04_435_7446484965928171550-1/-mr-10004/b2224650-d31a-4599-8606-57681287bd41/map.xml
> 2015-09-24 14:59:22,289 INFO [TezChild] io.HiveContextAwareRecordReader:
> Processing file hdfs://
> 0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/02_0
> 2015-09-24 14:59:22,289 INFO [TezChild] exec.Utilities: PLAN PATH = hdfs://
> 0.0.0.0:9000/tmp/hive/jzhang/1ddc9a11-28f8-45e5-aa12-aa457fc9530d/hive_2015-09-24_14-59-04_435_7446484965928171550-1/-mr-10004/b2224650-d31a-4599-8606-57681287bd41/map.xml
> 2015-09-24 14:59:22,296 INFO [TezChild] io.HiveContextAwareRecordReader:
> Processing file hdfs://
> 0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/03_0
> 2015-09-24 14:59:22,296 INFO [TezChild] exec.Utilities: PLAN PATH = hdfs://
> 0.0.0.0:9000/tmp/hive/jzhang/1ddc9a11-28f8-45e5-aa12-aa457fc9530d/hive_2015-09-24_14-59-04_435_7446484965928171550-1/-mr-10004/b2224650-d31a-4599-8606-57681287bd41/map.xml
> 2015-09-24 14:59:22,304 INFO [TezChild] io.HiveContextAwareRecordReader:
> Processing file hdfs://
> 0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/04_0
> 2015-09-24 14:59:22,304 INFO [TezChild] exec.Utilities: PLAN PATH = hdfs://
> 0.0.0.0:9000/tmp/hive/jzhang/1ddc9a11-28f8-45e5-aa12-aa457fc9530d/hive_2015-09-24_14-59-04_435_7446484965928171550-1/-mr-10004/b2224650-d31a-4599-8606-57681287bd41/map.xml
> 2015-09-24 14:59:22,311 INFO [TezChild] io.HiveContextAwareRecordReader:
> Processing file hdfs://
> 0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/05_0
>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: Hive Query on Hbase snapshot error

2015-09-24 Thread Sandeep Nemuri
You can check snapshot state if it is healthy or not using below command.


On Thu, Sep 24, 2015 at 2:55 PM, 核弹头す <510688...@qq.com> wrote:

> Hi all,
>
>
> I am using hive to query on base snapshot. But I got the following  error:
>
> FAILED: IllegalArgumentException
> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read
> snapshot info
> from:/tmp/hbase-huser/hbase/.hbase-snapshot/test_table_snap0/.snapshotinfo
>
> The following is the steps that I do:
>
> 1, on the hbase we already have an table: test_table. And I use command: 
> snapshot
> ‘test_table’ ,’test_table_snap0’   to create a  snapshot。
>
> 2,on the hive create an external table like:
>
> CREATE EXTERNAL TABLE `test_table_snapshot`(
>
>   `id` int,
>
>   `alias` string,
>
>   `kdt_id` int,
>
>   `created_time` string,
>
>   `update_time` string,
>
>   `title` string,
>
>   `price` bigint,
>
>   `goods_platform` int,
>
>   `buy_url` string,
>
>   `class1` int,
>
>   `class2` string,
>
>   `goods_type` tinyint,
>
>   `sold_status` tinyint,
>
>   `is_display` tinyint,
>
>   `is_delete` tinyint,
>
>   `num` bigint,
>
>   `buy_way` boolean,
>
>   `source` tinyint,
>
>   `content` string,
>
>   `picture` string,
>
>   `is_virtual` tinyint)
>
> ROW FORMAT SERDE
>
>   'org.apache.hadoop.hive.hbase.HBaseSerDe'
>
> STORED BY
>
>   'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>
> WITH SERDEPROPERTIES (
>
>   'field.delim'='\u0001',
>
>
> 'hbase.columns.mapping'=':key,cf1:alalis,cf1:kdt_id,cf1:created_time,cf1:update_time,cf1:title,cf1:price,cf1:goods_platform,cf1:buy_url,cf1:class1,cf1:class2,cf1:goods_type,cf1:sold_stattus,cf1:is_display,cf1:is_delete,cf1:num,cf1:buy_way,cf1:source,cf1:content,cf1:picture,cf1:is_virtual',
>
>   'line.delim'='\n',
>
>   'serialization.format'='\u0001')
>
> TBLPROPERTIES (
>
>   ‘hbase.table.name'='test_table')
>
>
> 4, on Hive, execute “select * from test_table_snapshot”,  then I get the
> above error.
>
>
> I have set the “hive.hbase.snapshot.restoredir” value which is same as the
> root dir on hbase, and I also set the zookeep server. I checked the base
> root directory on hdfs and I can see the snapshot files. But when I use the
> above query on hive. it seems that the base.tmp.dir is used to find the
> snapshot info. BTW, if I directly query on hbase, there is no problem. I do
> not know what happened?
>
>
> The following is the hive error log:
>
> FAILED: SemanticException
> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read
> snapshot info from:/tmp
> /hbase-huser/hbase/.hbase-snapshot/goods_v3_hbase_snap0/.snapshotinfo
>
> org.apache.hadoop.hive.ql.parse.SemanticException:
> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read
> snapshot info from:/tmp
> /hbase-huser/hbase/.hbase-snapshot/goods_v3_hbase_snap0/.snapshotinfo
>
> at
> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:117)
>
> at
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:182)
>
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:
> 10207)
>
> at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
>
> at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
>
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
>
> at
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
>
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
>
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
>
> at
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
>
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
>
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
>
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>
> Caused by: java.lang.IllegalArgumentException:
> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read
> snapshot info from:/tmp
> /hbase-huser/hbase/.hbase-snapshot/goods_v3_hbas

Re: Hive Query on Hbase snapshot error

2015-09-24 Thread Sandeep Nemuri
hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot test_snapshot
-stats -schema

On Thu, Sep 24, 2015 at 3:43 PM, Sandeep Nemuri 
wrote:

> You can check snapshot state if it is healthy or not using below command.
>
>
> On Thu, Sep 24, 2015 at 2:55 PM, 核弹头す <510688...@qq.com> wrote:
>
>> Hi all,
>>
>>
>> I am using hive to query on base snapshot. But I got the following  error:
>>
>> FAILED: IllegalArgumentException
>> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read
>> snapshot info
>> from:/tmp/hbase-huser/hbase/.hbase-snapshot/test_table_snap0/.snapshotinfo
>>
>> The following is the steps that I do:
>>
>> 1, on the hbase we already have an table: test_table. And I use command: 
>> snapshot
>> ‘test_table’ ,’test_table_snap0’   to create a  snapshot。
>>
>> 2,on the hive create an external table like:
>>
>> CREATE EXTERNAL TABLE `test_table_snapshot`(
>>
>>   `id` int,
>>
>>   `alias` string,
>>
>>   `kdt_id` int,
>>
>>   `created_time` string,
>>
>>   `update_time` string,
>>
>>   `title` string,
>>
>>   `price` bigint,
>>
>>   `goods_platform` int,
>>
>>   `buy_url` string,
>>
>>   `class1` int,
>>
>>   `class2` string,
>>
>>   `goods_type` tinyint,
>>
>>   `sold_status` tinyint,
>>
>>   `is_display` tinyint,
>>
>>   `is_delete` tinyint,
>>
>>   `num` bigint,
>>
>>   `buy_way` boolean,
>>
>>   `source` tinyint,
>>
>>   `content` string,
>>
>>   `picture` string,
>>
>>   `is_virtual` tinyint)
>>
>> ROW FORMAT SERDE
>>
>>   'org.apache.hadoop.hive.hbase.HBaseSerDe'
>>
>> STORED BY
>>
>>   'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>>
>> WITH SERDEPROPERTIES (
>>
>>   'field.delim'='\u0001',
>>
>>
>> 'hbase.columns.mapping'=':key,cf1:alalis,cf1:kdt_id,cf1:created_time,cf1:update_time,cf1:title,cf1:price,cf1:goods_platform,cf1:buy_url,cf1:class1,cf1:class2,cf1:goods_type,cf1:sold_stattus,cf1:is_display,cf1:is_delete,cf1:num,cf1:buy_way,cf1:source,cf1:content,cf1:picture,cf1:is_virtual',
>>
>>   'line.delim'='\n',
>>
>>   'serialization.format'='\u0001')
>>
>> TBLPROPERTIES (
>>
>>   ‘hbase.table.name'='test_table')
>>
>>
>> 4, on Hive, execute “select * from test_table_snapshot”,  then I get the
>> above error.
>>
>>
>> I have set the “hive.hbase.snapshot.restoredir” value which is same as
>> the root dir on hbase, and I also set the zookeep server. I checked the
>> base root directory on hdfs and I can see the snapshot files. But when I
>> use the above query on hive. it seems that the base.tmp.dir is used to find
>> the snapshot info. BTW, if I directly query on hbase, there is no problem.
>> I do not know what happened?
>>
>>
>> The following is the hive error log:
>>
>> FAILED: SemanticException
>> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read
>> snapshot info from:/tmp
>> /hbase-huser/hbase/.hbase-snapshot/goods_v3_hbase_snap0/.snapshotinfo
>>
>> org.apache.hadoop.hive.ql.parse.SemanticException:
>> org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read
>> snapshot info from:/tmp
>> /hbase-huser/hbase/.hbase-snapshot/goods_v3_hbase_snap0/.snapshotinfo
>>
>> at
>> org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:117)
>>
>> at
>> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:182)
>>
>> at
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:
>> 10207)
>>
>> at
>> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)
>>
>> at
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
>>
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
>>
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
>>
>> at
>> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
>>
>> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1160)
>>
>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
>>
>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
>>
>> at
>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
>>
>> at
>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
>>
>> at
>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
>>
>> at
>> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
>>
>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
>>
>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
>>
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>> at java.lang.reflect.Method.invoke(Method.java:606)
>>
>> at 

hive.aux.jars.path and add jar for storage handler

2015-09-24 Thread Lonikar, Kiran
Hi,

We have an application that creates tables with our own storage handler.

It works if I set the env variable HIVE_AUX_JARS_PATH pointing to the dir 
containing the jar of my storage handler class. This env var needs to be set 
before starting HiveServer2. The alternative is to change the hive-site.xml and 
add the following section:


  hive.aux.jars.path
 
file:///path/to/my/storagehandler.jar,file:///path/to/other.jar>


Another alternative is to start the HS2 as:
/usr/bin/hiveserver2 --hiveconf 
hive.aux.jars.path=file:///path/to/my/storagehandler.jar,file:///path/to/other.jar

However, all these ways mean starting the HiveServer2 in a manner that involves 
sysadmin intervention.

I tried to set the set the configuration property using the "set 
hive.aux.jars.path= 
file:///path/to/my/storagehandler.jar,file:///path/to/other.jar"
 command after the JDBC session with HS2 was setup. But it did not have any 
effect. I am getting the class not found error for the storage handler class 
from HS2 itself which means the error is during query compilation.

I also tried the "add jar ..." command but to no avail.

Can someone explain what is the role of the property hive.aux.jars.path when 
set in a session? Also, the role of add jar?

Do these take effect only for the map reduce tasks launched only? If so, whats 
the setting for HiveServer2 process itself which needs to load the storage 
handler class during query compilation.

-Kiran






?????? Hive Query on Hbase snapshot error

2015-09-24 Thread ????????
Thank you for you reply.


The snapshot should be no problem. Here is the following output for the giving 
command.


 
2015-09-24 19:37:41,981 INFO  [main] Configuration.deprecation: 
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
 
Snapshot Info
 

 
   Name: test_snapshot
 
   Type: FLUSH
 
  Table: test_table
 
 Format: 2
 
Created: 2015-09-24T16:00:11
 


 
Table Descriptor
 

 
'test_table', {TABLE_ATTRIBUTES => {coprocessor$1 => 
'|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 
=> 
'|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', 
coprocessor$3 => 
'|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', 
coprocessor$4 => 
'|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', 
coprocessor$5 => 
'|org.apache.hadoop.hbase.regionserver.LocalIndexSplitter|805306366|'}, {NAME 
=> 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', 
REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS 
=> '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', 
IN_MEMORY => 'false', BLOCKCACHE => 'true'}
 


 


 
32 HFiles (0 in archive), total size 236.1 G (100.00% 236.1 G shared with the 
source table)
 
0 Logs, total size 0





--  --
??: "Sandeep Nemuri";;
: 2015??9??24??(??) 6:43
??: "Hive user group"; 

: Re: Hive Query on Hbase snapshot error



hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot test_snapshot 
-stats -schema


On Thu, Sep 24, 2015 at 3:43 PM, Sandeep Nemuri  wrote:
You can check snapshot state if it is healthy or not using below command.



On Thu, Sep 24, 2015 at 2:55 PM,  <510688...@qq.com> wrote:

Hi all,




I am using hive to query on base snapshot. But I got the following  error:

FAILED: IllegalArgumentException 
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
snapshot info 
from:/tmp/hbase-huser/hbase/.hbase-snapshot/test_table_snap0/.snapshotinfo

The following is the steps that I do:

1, on the hbase we already have an table: test_table. And I use command: 
snapshot ??test_table?? test_table_snap0??   to create a  snapshot??

2??on the hive create an external table like??

CREATE EXTERNAL TABLE `test_table_snapshot`(

  `id` int,

  `alias` string,

  `kdt_id` int,

  `created_time` string,

  `update_time` string,

  `title` string,

  `price` bigint,

  `goods_platform` int,

  `buy_url` string,

  `class1` int,

  `class2` string,

  `goods_type` tinyint,

  `sold_status` tinyint,

  `is_display` tinyint,

  `is_delete` tinyint,

  `num` bigint,

  `buy_way` boolean,

  `source` tinyint,

  `content` string,

  `picture` string,

  `is_virtual` tinyint)

ROW FORMAT SERDE

  'org.apache.hadoop.hive.hbase.HBaseSerDe'

STORED BY

  'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES (

  'field.delim'='\u0001',

  
'hbase.columns.mapping'=':key,cf1:alalis,cf1:kdt_id,cf1:created_time,cf1:update_time,cf1:title,cf1:price,cf1:goods_platform,cf1:buy_url,cf1:class1,cf1:class2,cf1:goods_type,cf1:sold_stattus,cf1:is_display,cf1:is_delete,cf1:num,cf1:buy_way,cf1:source,cf1:content,cf1:picture,cf1:is_virtual',

  'line.delim'='\n',

  'serialization.format'='\u0001')

TBLPROPERTIES (

  ??hbase.table.name'='test_table')




4, on Hive, execute ??select * from test_table_snapshot??,  then I get the 
above error.




I have set the ??hive.hbase.snapshot.restoredir?? value which is same as the 
root dir on hbase, and I also set the zookeep server. I checked the base root 
directory on hdfs and I can see the snapshot files. But when I use the above 
query on hive. it seems that the base.tmp.dir is used to find the snapshot 
info. BTW, if I directly query on hbase, there is no problem. I do not know 
what happened?




The following is the hive error log:



FAILED: SemanticException 
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
snapshot info 
from:/tmp/hbase-huser/hbase/.hbase-snapshot/goods_v3_hbase_snap0/.snapshotinfo

org.apache.hadoop.hive.ql.parse.SemanticException: 
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
snapshot info 
from:/tmp/hbase-huser/hbase/.hbase-snapshot/goods_v3_hbase_snap0/.snapshotinfo

at 
org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimizer.java:117)

at 
org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:182)

at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10207)

at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:192)

at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)

at org.apache.hadoop.hive.ql.Dri

?????? Hive Query on Hbase snapshot error

2015-09-24 Thread ????????
It seems that the snapshotDir is not correctly set. I have set the 
hive.hbase.snapshot.restoredir. Is  the hive.hbase.snapshot.restoredir to 
control the snapshotDir? if not, how can i set the snapshotDir when I use hive 
to query on hbase snapshot.




--  --
??: "";<510688...@qq.com>;
: 2015??9??24??(??) 7:44
??: "user"; 

: ?? Hive Query on Hbase snapshot error



Thank you for you reply.


The snapshot should be no problem. Here is the following output for the giving 
command.


 
2015-09-24 19:37:41,981 INFO  [main] Configuration.deprecation: 
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
 
Snapshot Info
 

 
   Name: test_snapshot
 
   Type: FLUSH
 
  Table: test_table
 
 Format: 2
 
Created: 2015-09-24T16:00:11
 


 
Table Descriptor
 

 
'test_table', {TABLE_ATTRIBUTES => {coprocessor$1 => 
'|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 
=> 
'|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', 
coprocessor$3 => 
'|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', 
coprocessor$4 => 
'|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', 
coprocessor$5 => 
'|org.apache.hadoop.hbase.regionserver.LocalIndexSplitter|805306366|'}, {NAME 
=> 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', 
REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS 
=> '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', 
IN_MEMORY => 'false', BLOCKCACHE => 'true'}
 


 


 
32 HFiles (0 in archive), total size 236.1 G (100.00% 236.1 G shared with the 
source table)
 
0 Logs, total size 0





--  --
??: "Sandeep Nemuri";;
: 2015??9??24??(??) 6:43
??: "Hive user group"; 

: Re: Hive Query on Hbase snapshot error



hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot test_snapshot 
-stats -schema


On Thu, Sep 24, 2015 at 3:43 PM, Sandeep Nemuri  wrote:
You can check snapshot state if it is healthy or not using below command.



On Thu, Sep 24, 2015 at 2:55 PM,  <510688...@qq.com> wrote:

Hi all,




I am using hive to query on base snapshot. But I got the following  error:

FAILED: IllegalArgumentException 
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
snapshot info 
from:/tmp/hbase-huser/hbase/.hbase-snapshot/test_table_snap0/.snapshotinfo

The following is the steps that I do:

1, on the hbase we already have an table: test_table. And I use command: 
snapshot ??test_table?? test_table_snap0??   to create a  snapshot??

2??on the hive create an external table like??

CREATE EXTERNAL TABLE `test_table_snapshot`(

  `id` int,

  `alias` string,

  `kdt_id` int,

  `created_time` string,

  `update_time` string,

  `title` string,

  `price` bigint,

  `goods_platform` int,

  `buy_url` string,

  `class1` int,

  `class2` string,

  `goods_type` tinyint,

  `sold_status` tinyint,

  `is_display` tinyint,

  `is_delete` tinyint,

  `num` bigint,

  `buy_way` boolean,

  `source` tinyint,

  `content` string,

  `picture` string,

  `is_virtual` tinyint)

ROW FORMAT SERDE

  'org.apache.hadoop.hive.hbase.HBaseSerDe'

STORED BY

  'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

WITH SERDEPROPERTIES (

  'field.delim'='\u0001',

  
'hbase.columns.mapping'=':key,cf1:alalis,cf1:kdt_id,cf1:created_time,cf1:update_time,cf1:title,cf1:price,cf1:goods_platform,cf1:buy_url,cf1:class1,cf1:class2,cf1:goods_type,cf1:sold_stattus,cf1:is_display,cf1:is_delete,cf1:num,cf1:buy_way,cf1:source,cf1:content,cf1:picture,cf1:is_virtual',

  'line.delim'='\n',

  'serialization.format'='\u0001')

TBLPROPERTIES (

  ??hbase.table.name'='test_table')




4, on Hive, execute ??select * from test_table_snapshot??,  then I get the 
above error.




I have set the ??hive.hbase.snapshot.restoredir?? value which is same as the 
root dir on hbase, and I also set the zookeep server. I checked the base root 
directory on hdfs and I can see the snapshot files. But when I use the above 
query on hive. it seems that the base.tmp.dir is used to find the snapshot 
info. BTW, if I directly query on hbase, there is no problem. I do not know 
what happened?




The following is the hive error log:



FAILED: SemanticException 
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
snapshot info 
from:/tmp/hbase-huser/hbase/.hbase-snapshot/goods_v3_hbase_snap0/.snapshotinfo

org.apache.hadoop.hive.ql.parse.SemanticException: 
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
snapshot info 
from:/tmp/hbase-huser/hbase/.hbase-snapshot/goods_v3_hbase_snap0/.snapshotinfo

at 
org.apache.hadoop.hive.ql.optimizer.SimpleFetchOptimizer.transform(SimpleFetchOptimiz

Re: ORA-8177 with Hive transactions

2015-09-24 Thread Steve Howard
All,

We continue to struggle with this.  We *never* get  the lock, and found one
issue in which the retry logic gets in an infinite loop.  We submitted a
JIRA for that (https://issues.apache.org/jira/browse/HIVE-11934), and
patched our version (HDP 2.3, Hive 1.2.1) with a fix in which the
deadlockCount variable is no longer managed in the lock() method.  That
works, but we still couldn't get the lock, and the exception was thrown
after ten retries.  At least we knew it was broken earlier ;)

We have made the changes to the HIVE transaction tables to enable
ROWDEPENDENCIES, but are still plagued with serialization errors that are
never resolved.

We have only a single writer, as the Hive database environment is used as a
target for an existing EDW dataset.  The job loader is the only one making
changes.  However, it is for analytics, so we have a lot of readers.

We have considered changing the TRANSACTION_SERIALIZABLE for the dbConn()
method call in the TxnHandler class to READ_COMMITTED, as Oracle provides
consistent reads.  Of course, the serialization exception is thrown when
one thread (a read or a writer I guess) attempts to lock a hive table (or
in our case, several hundred daily hive table partitions) attempts to
update the row, and another thread has changed and committed it in the
meantime.

Unless I missing it, this will always be an issue since we have readers and
writers and each appears to take a lock.

If we know we will have a single writer, the largest risk is that the
reader thinks the data hasn't changed, when it has.  For our needs, that
isn't a huge issue.

Are we missing something?  Any ideas?

Thanks,

Steve

On Fri, Sep 18, 2015 at 3:39 PM, Steve Howard 
wrote:

> I think ROWDEPENDENCIES on an Oracle table also covers this issue, so I
> don't think a separate JIRA is needed for the INITRANS change.
>
> On Fri, Sep 18, 2015 at 2:51 PM, Sergey Shelukhin 
> wrote:
>
>> There’s HIVE-11831 
>>  and https://issues.apache.org/jira/browse/HIVE-11833 that try to
>> address this.
>> We can do a patch similar to the first one; can you file a JIRA?
>>
>> From: Steve Howard 
>> Reply-To: "user@hive.apache.org" 
>> Date: Friday, September 18, 2015 at 10:54
>> To: "user@hive.apache.org" 
>> Subject: ORA-8177 with Hive transactions
>>
>> While troubleshooting an issue with transactions shortly after enabling
>> them, I noticed the following in an Oracle trace, which is our metastore
>> for hive...
>>
>> ORA-8177: can't serialize access for this transaction
>>
>> These were thrown on "insert into HIVE_LOCKS..."
>>
>> Traditionally in Oracle, if an application actually needs serializable
>> transactions, the fix is to to set initrans and maxtrans to the number of
>> concurrent writers.  When I ran what is below on a table similar to
>> HIVE_LOCKS, this exception was thrown everywhere.  The fix is to recreate
>> the table with higher values for initrans (only 1 is the default for
>> initrans, and 255 is the default for maxtrans).  When I did this and re-ran
>> what is below, the exceptions were no longer thrown.
>>
>> Does anyone have any feedback on this performance hint?  The exceptions
>> in hive are thrown from the checkRetryable method in the TxnHandler class,
>> but I couldn't find what class.method throws them.  Perhaps the exceptions
>> are not impactful, but given the fact the method expects them as it checks
>> for the string in the exception message, I thought I would ask for feedback
>> before we recreate the HIVE_LOCKS table with a higher value for INITRANS.
>>
>> import java.sql.*;public class testLock implements Runnable {
>>   public static void main (String[] args) throws Exception {
>> Class.forName("oracle.jdbc.driver.OracleDriver");
>> for (int i = 1; i <= 100; i++) {
>>   testLock tl = new testLock();
>> }
>>   }
>>
>>   public testLock() {
>> Thread t = new Thread(this);
>> t.start();
>>   }
>>
>>   public void run() {
>> try {
>>   Connection conn = 
>> DriverManager.getConnection("jdbc:oracle:thin:username/pwd@dbhost:1521/dbservice");
>>   conn.createStatement().execute("alter session set isolation_level = 
>> serializable");
>>   PreparedStatement pst = conn.prepareStatement("update test set a = ?");
>>   for (int j = 1; j <= 1; j++) {
>> pst.setInt(1,j);
>> pst.execute();
>> conn.commit();
>> System.out.println("worked");
>>   }
>> }
>> catch (Exception e) {
>>   System.out.println(e.getMessage());
>> }
>>   }}
>>
>>
>


Re: Bucketing is not leveraged in filter push down ?

2015-09-24 Thread Edward Capriolo
Right. The big place the bucketing is leveraged is on bucket based joins.

On Thu, Sep 24, 2015 at 3:29 AM, Jeff Zhang  wrote:

> I have one table which is bucketed on column name. Then I have the
> following sql:
>
> - select count(1) from student_bucketed_2 where name = 'calvin
> nixon';
>
> Ideally I think it should only scan one bucket. But from the log I still
> see it will scan all the bucket files. Why bucketing is not leveraged in
> filter push down ?
>
> Here's the log:
>
> 2015-09-24 14:59:22,282 INFO [TezChild] io.HiveContextAwareRecordReader:
> Processing file hdfs://
> 0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/01_0
> 2015-09-24 14:59:22,282 INFO [TezChild] exec.Utilities: PLAN PATH = hdfs://
> 0.0.0.0:9000/tmp/hive/jzhang/1ddc9a11-28f8-45e5-aa12-aa457fc9530d/hive_2015-09-24_14-59-04_435_7446484965928171550-1/-mr-10004/b2224650-d31a-4599-8606-57681287bd41/map.xml
> 2015-09-24 14:59:22,289 INFO [TezChild] io.HiveContextAwareRecordReader:
> Processing file hdfs://
> 0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/02_0
> 2015-09-24 14:59:22,289 INFO [TezChild] exec.Utilities: PLAN PATH = hdfs://
> 0.0.0.0:9000/tmp/hive/jzhang/1ddc9a11-28f8-45e5-aa12-aa457fc9530d/hive_2015-09-24_14-59-04_435_7446484965928171550-1/-mr-10004/b2224650-d31a-4599-8606-57681287bd41/map.xml
> 2015-09-24 14:59:22,296 INFO [TezChild] io.HiveContextAwareRecordReader:
> Processing file hdfs://
> 0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/03_0
> 2015-09-24 14:59:22,296 INFO [TezChild] exec.Utilities: PLAN PATH = hdfs://
> 0.0.0.0:9000/tmp/hive/jzhang/1ddc9a11-28f8-45e5-aa12-aa457fc9530d/hive_2015-09-24_14-59-04_435_7446484965928171550-1/-mr-10004/b2224650-d31a-4599-8606-57681287bd41/map.xml
> 2015-09-24 14:59:22,304 INFO [TezChild] io.HiveContextAwareRecordReader:
> Processing file hdfs://
> 0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/04_0
> 2015-09-24 14:59:22,304 INFO [TezChild] exec.Utilities: PLAN PATH = hdfs://
> 0.0.0.0:9000/tmp/hive/jzhang/1ddc9a11-28f8-45e5-aa12-aa457fc9530d/hive_2015-09-24_14-59-04_435_7446484965928171550-1/-mr-10004/b2224650-d31a-4599-8606-57681287bd41/map.xml
> 2015-09-24 14:59:22,311 INFO [TezChild] io.HiveContextAwareRecordReader:
> Processing file hdfs://
> 0.0.0.0:9000/user/hive/warehouse/student_bucketed_2/age=26/05_0
>
>
> --
> Best Regards
>
> Jeff Zhang
>


RE: ORA-8177 with Hive transactions

2015-09-24 Thread Mich Talebzadeh
Yes I came across this back in April trying to load 1.7 million rows from an 
RDBMS via SAP replication server into Hive

 

My notes were

 

“Trying to sync a table from ASE--> RS -->  to Hadoop via DIRECT LOAD. The 
source has 1.7 million rows and is populating Hive table. However, I only get 
around 50K rows in Hive table before MapReduce jobs gives up and get killed.

 

I have turned on concurrency and use an Oracle database as metastore. Data I 
believe is delivered in bulk through files show as rs_temp__nnn below. I 
thought that by turning concurrency on in Hive, I would have resolved the 
problem.

 

2015-04-16 17:04:34,773 WARN  [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,784 INFO  [pool-3-thread-197]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(713)) - 158: source:127.0.0.1 get_table : 
db=asehadoop tbl=rs_temp__0x2aaab81d6ab0_t

2015-04-16 17:04:34,784 INFO  [pool-3-thread-197]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(339)) - ugi=hduser ip=127.0.0.1
cmd=source:127.0.0.1 get_table : db=asehadoop tbl=rs_temp__0x2aaab81d6ab0_t

2015-04-16 17:04:34,785 WARN  [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,798 WARN  [pool-3-thread-154]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,799 WARN  [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,808 INFO  [pool-3-thread-198]: metastore.HiveMetaStore 
(HiveMetaStore.java:logInfo(713)) - 162: source:127.0.0.1 get_table : 
db=asehadoop tbl=rs_temp__0x2aaab804eda0_t

2015-04-16 17:04:34,809 INFO  [pool-3-thread-198]: HiveMetaStore.audit 
(HiveMetaStore.java:logAuditEvent(339)) - ugi=hduser ip=127.0.0.1
cmd=source:127.0.0.1 get_table : db=asehadoop tbl=rs_temp__0x2aaab804eda0_t

2015-04-16 17:04:34,810 WARN  [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,813 WARN  [pool-3-thread-154]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,827 ERROR [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(931)) - Too many repeated deadlocks in unlock, 
giving up.

2015-04-16 17:04:34,835 WARN  [pool-3-thread-154]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,839 ERROR [pool-3-thread-199]: metastore.RetryingHMSHandler 
(RetryingHMSHandler.java:invoke(141)) - org.apache.thrift.TException: 

MetaException(message:Unable to update transaction database 
java.sql.SQLException: ORA-08177: can't serialize access for this transaction

 

 

Now that ORA-08177 means that the transaction ordering of bulk data from RS . 
According to the docs, ORA-08177 can be caused only by serializable 
transactions. It means that a row which the serializable transaction is trying 
to modify was modified by another transaction after the serializable 
transaction has begun. “

 

After a couple of weeks I came up with the following approach

 

“OK guys,

 

Some good news

 

Sounds like setting these two parameters helps!

 

1. --

-- Parameter "mat_load_tran_size", Default: 1, specifies the optimal 
transaction size or batch size for the initial copying of primary data to the 
replicate table during direct load materialization.

alter connection to hiveserver2.asehadoop set mat_load_tran_size to "5"

go

 

-- Parameter "max_mat_load_threads", Default: 5, specifies the maximum number 
of load threads for each table being materialized.

alter connection to hiveserver2.asehadoop set max_mat_load_threads to "1"

 

Makes things work without falling over

 

2. Need to have concurrency enabled in Hive metastore. Mine is on Oracle. 
You need to run separate sql against it one labelled like 
hive-txn-schema-0.14.0.oracle.sql after the basic one 
hive-schema-0.14.0.oracle.sql

3. Make sure that concurrency is enabled in Hive. hive.support.concurrency 

   is false by default

4. Once concurrency in Hive is enabled, you need to install and run Apache 
zookeeper   for distributed lock management 
otherwise you are going to encounter deadlock or serialisation issues in your 
metadata as below

2015-04-16 17:04:34,785 WARN  [pool-3-thread-199]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, trying 
again.

2015-04-16 17:04:34,798 WARN  [pool-3-thread-154]: txn.TxnHandler 
(TxnHandler.java:detectDeadlock(928)) - Deadlock detected in unlock, tryi