Re: Running hive queries in different queue

2016-02-28 Thread Rajit Saha
Thanks a lot Sathi.


I also found in the Hive Execution Engine is MapReduce
set mapreduce.job.queuename=; works

If the Hive Execution Engine is Tez
We need to do
set tez.queue.name=;



Cheers

Rajit Saha

Principal DevOps Engineer | BigData
LendingClub




From: Sathi Chowdhury 
mailto:sathi.chowdh...@lithium.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Date: Friday, February 26, 2016 at 6:01 PM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
mailto:user@hive.apache.org>>
Subject: Re: Running hive queries in different queue

I think  in your hive script you can do
set mapreduce.job.queuename=;
Thanks
Sathi

From: Rajit Saha
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>"
Date: Friday, February 26, 2016 at 5:34 PM
To: "user@hive.apache.org<mailto:user@hive.apache.org>"
Subject: Running hive queries in different queue

Hi

I want to run hive query in a queue others than "default" queue from hive 
client command line . Can anybody please suggest a way to do it.

Regards
Rajit

On Feb 26, 2016, at 07:36, Patrick Duin 
mailto:patd...@gmail.com>> wrote:

Hi Prasanth.

Thanks for the quick reply!

The logs don't show much more of the stacktrace I'm afraid:
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:809)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


The stacktrace isn't really the issue though. The NullPointer is a symptom 
caused by not being able to return any stripes, if you look at the line in the 
code it is  because the 'stripes' field is null which should never happen. 
This, we think, is caused by failing namenode network traffic. We would have 
lots of IO warning in the logs saying block's cannot be found or e.g.:
16/02/01 13:20:34 WARN hdfs.BlockReaderFactory: I/O error constructing remote 
block reader.
java.io.IOException: java.lang.InterruptedException
at org.apache.hadoop.ipc.Client.call(Client.java:1448)
at org.apache.hadoop.ipc.Client.call(Client.java:1400)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy32.getServerDefaults(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getServerDefaults(ClientNamenodeProtocolTranslatorPB.java:268)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy33.getServerDefaults(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:1007)
at 
org.apache.hadoop.hdfs.DFSClient.shouldEncryptData(DFSClient.java:2062)
at 
org.apache.hadoop.hdfs.DFSClient.newDataEncryptionKey(DFSClient.java:2068)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:208)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:159)
at 
org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:90)
at 
org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3123)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:848)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:407)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:311)
at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
at 

Re: Running hive queries in different queue

2016-02-27 Thread Mich Talebzadeh
Hello.

What Hive client are you using? beeline

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 27 February 2016 at 01:34, Rajit Saha  wrote:

> Hi
>
> I want to run hive query in a queue others than "default" queue from hive
> client command line . Can anybody please suggest a way to do it.
>
> Regards
> Rajit
>
> On Feb 26, 2016, at 07:36, Patrick Duin  wrote:
>
> Hi Prasanth.
>
> Thanks for the quick reply!
>
> The logs don't show much more of the stacktrace I'm afraid:
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:809)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
> The stacktrace isn't really the issue though. The NullPointer is a symptom
> caused by not being able to return any stripes, if you look at the line in
> the code it is  because the 'stripes' field is null which should never
> happen. This, we think, is caused by failing namenode network traffic. We
> would have lots of IO warning in the logs saying block's cannot be found or
> e.g.:
> 16/02/01 13:20:34 WARN hdfs.BlockReaderFactory: I/O error constructing
> remote block reader.
> java.io.IOException: java.lang.InterruptedException
> at org.apache.hadoop.ipc.Client.call(Client.java:1448)
> at org.apache.hadoop.ipc.Client.call(Client.java:1400)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy32.getServerDefaults(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getServerDefaults(ClientNamenodeProtocolTranslatorPB.java:268)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy33.getServerDefaults(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:1007)
> at
> org.apache.hadoop.hdfs.DFSClient.shouldEncryptData(DFSClient.java:2062)
> at
> org.apache.hadoop.hdfs.DFSClient.newDataEncryptionKey(DFSClient.java:2068)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:208)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:159)
> at
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:90)
> at
> org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3123)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
> at
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:848)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:407)
> at
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:311)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:885)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:771)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException
> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
> at java.util.concurrent.FutureTask.get(FutureTask.java:187)
> at
> org.apache.hadoo

Re: Running hive queries in different queue

2016-02-26 Thread Sathi Chowdhury
I think  in your hive script you can do
set mapreduce.job.queuename=;
Thanks
Sathi

From: Rajit Saha
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>"
Date: Friday, February 26, 2016 at 5:34 PM
To: "user@hive.apache.org<mailto:user@hive.apache.org>"
Subject: Running hive queries in different queue

Hi

I want to run hive query in a queue others than "default" queue from hive 
client command line . Can anybody please suggest a way to do it.

Regards
Rajit

On Feb 26, 2016, at 07:36, Patrick Duin 
mailto:patd...@gmail.com>> wrote:

Hi Prasanth.

Thanks for the quick reply!

The logs don't show much more of the stacktrace I'm afraid:
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:809)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


The stacktrace isn't really the issue though. The NullPointer is a symptom 
caused by not being able to return any stripes, if you look at the line in the 
code it is  because the 'stripes' field is null which should never happen. 
This, we think, is caused by failing namenode network traffic. We would have 
lots of IO warning in the logs saying block's cannot be found or e.g.:
16/02/01 13:20:34 WARN hdfs.BlockReaderFactory: I/O error constructing remote 
block reader.
java.io.IOException: java.lang.InterruptedException
at org.apache.hadoop.ipc.Client.call(Client.java:1448)
at org.apache.hadoop.ipc.Client.call(Client.java:1400)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy32.getServerDefaults(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getServerDefaults(ClientNamenodeProtocolTranslatorPB.java:268)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy33.getServerDefaults(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:1007)
at 
org.apache.hadoop.hdfs.DFSClient.shouldEncryptData(DFSClient.java:2062)
at 
org.apache.hadoop.hdfs.DFSClient.newDataEncryptionKey(DFSClient.java:2068)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:208)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:159)
at 
org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:90)
at 
org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3123)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:848)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:407)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:311)
at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:885)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:771)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
at java.util.concurrent.FutureTask.get(FutureTask.java:187)
at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1047)
at or

Running hive queries in different queue

2016-02-26 Thread Rajit Saha
Hi

I want to run hive query in a queue others than "default" queue from hive 
client command line . Can anybody please suggest a way to do it.

Regards
Rajit

On Feb 26, 2016, at 07:36, Patrick Duin 
mailto:patd...@gmail.com>> wrote:

Hi Prasanth.

Thanks for the quick reply!

The logs don't show much more of the stacktrace I'm afraid:
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:809)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


The stacktrace isn't really the issue though. The NullPointer is a symptom 
caused by not being able to return any stripes, if you look at the line in the 
code it is  because the 'stripes' field is null which should never happen. 
This, we think, is caused by failing namenode network traffic. We would have 
lots of IO warning in the logs saying block's cannot be found or e.g.:
16/02/01 13:20:34 WARN hdfs.BlockReaderFactory: I/O error constructing remote 
block reader.
java.io.IOException: java.lang.InterruptedException
at org.apache.hadoop.ipc.Client.call(Client.java:1448)
at org.apache.hadoop.ipc.Client.call(Client.java:1400)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy32.getServerDefaults(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getServerDefaults(ClientNamenodeProtocolTranslatorPB.java:268)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy33.getServerDefaults(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:1007)
at 
org.apache.hadoop.hdfs.DFSClient.shouldEncryptData(DFSClient.java:2062)
at 
org.apache.hadoop.hdfs.DFSClient.newDataEncryptionKey(DFSClient.java:2068)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:208)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:159)
at 
org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:90)
at 
org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3123)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)
at 
org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:848)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:407)
at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:311)
at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:885)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:771)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
at java.util.concurrent.FutureTask.get(FutureTask.java:187)
at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1047)
at org.apache.hadoop.ipc.Client.call(Client.java:1442)
... 33 more

Our job doesn't always fail sometimes splits get calculated. We suspect when 
the namenode is too busy our job maybe hits some time-outs and the whole thing 
fails.

Our intuition has been the same as you suggest, bigger files is better. But we 
see a degradation in performance as soon as our files get bigger than the ORC 
bl