Re: SARG predicate is ignored when query ORC table

2016-02-27 Thread Jie Zhang
Hi, Mich,

Thanks for the reply. We don't set any tblproperties when creating table.
Here is the TBLPROPERTIES part from show create table:

STORED AS ORC
TBLPROPERTIES ('transient_lastDdlTime'='1455765074')

Jessica


On Sat, Feb 27, 2016 at 11:15 AM, Mich Talebzadeh  wrote:

> Hi,
>
> Can you do show create table  on your external table and send the
> sections from
>
> STORED AS ORC
> TBLPROPERTIES (
>
> onwards please?
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 27 February 2016 at 18:59, Jie Zhang  wrote:
>
>> Hi,
>>
>> We have an external ORC table which includes ~200 relatively small orc
>> files (less than 256MB). When querying the table with selective SARG
>> predicate (explain shows the predicate is qualified pushdown), we expects a
>> few splits generated with pruning based on predicate condition and only a
>> few files will be scanned. However, somehow predicate pushdown is not in
>> effect at all, all the files are scanned in MR job and SARG did not even
>> show up in the MR job config.
>>
>> After digging more in hive code (version 0.14), looks like the split
>> pruning only happens for the stripes within each file. If the file size is
>> smaller than default split size, SARG is not considered. Here is the code
>> we are referring:
>>
>> https://github.com/apache/hive/blob/release-0.14.0/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L656
>>
>>
>> Any idea why SARG is ignored for this scenario? also can split pruning
>> filter out the files with all stripes not satisfied with SARG condition?
>> Thanks for any help, really appreciated.
>>
>> Jessica
>>
>
>


Re: SARG predicate is ignored when query ORC table

2016-02-27 Thread Mich Talebzadeh
Hi,

Can you do show create table  on your external table and send the
sections from

STORED AS ORC
TBLPROPERTIES (

onwards please?

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 27 February 2016 at 18:59, Jie Zhang  wrote:

> Hi,
>
> We have an external ORC table which includes ~200 relatively small orc
> files (less than 256MB). When querying the table with selective SARG
> predicate (explain shows the predicate is qualified pushdown), we expects a
> few splits generated with pruning based on predicate condition and only a
> few files will be scanned. However, somehow predicate pushdown is not in
> effect at all, all the files are scanned in MR job and SARG did not even
> show up in the MR job config.
>
> After digging more in hive code (version 0.14), looks like the split
> pruning only happens for the stripes within each file. If the file size is
> smaller than default split size, SARG is not considered. Here is the code
> we are referring:
>
> https://github.com/apache/hive/blob/release-0.14.0/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L656
>
>
> Any idea why SARG is ignored for this scenario? also can split pruning
> filter out the files with all stripes not satisfied with SARG condition?
> Thanks for any help, really appreciated.
>
> Jessica
>


SARG predicate is ignored when query ORC table

2016-02-27 Thread Jie Zhang
Hi,

We have an external ORC table which includes ~200 relatively small orc
files (less than 256MB). When querying the table with selective SARG
predicate (explain shows the predicate is qualified pushdown), we expects a
few splits generated with pruning based on predicate condition and only a
few files will be scanned. However, somehow predicate pushdown is not in
effect at all, all the files are scanned in MR job and SARG did not even
show up in the MR job config.

After digging more in hive code (version 0.14), looks like the split
pruning only happens for the stripes within each file. If the file size is
smaller than default split size, SARG is not considered. Here is the code
we are referring:
https://github.com/apache/hive/blob/release-0.14.0/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L656


Any idea why SARG is ignored for this scenario? also can split pruning
filter out the files with all stripes not satisfied with SARG condition?
Thanks for any help, really appreciated.

Jessica


Re: Running hive queries in different queue

2016-02-27 Thread Mich Talebzadeh
Hello.

What Hive client are you using? beeline

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 27 February 2016 at 01:34, Rajit Saha  wrote:

> Hi
>
> I want to run hive query in a queue others than "default" queue from hive
> client command line . Can anybody please suggest a way to do it.
>
> Regards
> Rajit
>
> On Feb 26, 2016, at 07:36, Patrick Duin  wrote:
>
> Hi Prasanth.
>
> Thanks for the quick reply!
>
> The logs don't show much more of the stacktrace I'm afraid:
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:809)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
>
> The stacktrace isn't really the issue though. The NullPointer is a symptom
> caused by not being able to return any stripes, if you look at the line in
> the code it is  because the 'stripes' field is null which should never
> happen. This, we think, is caused by failing namenode network traffic. We
> would have lots of IO warning in the logs saying block's cannot be found or
> e.g.:
> 16/02/01 13:20:34 WARN hdfs.BlockReaderFactory: I/O error constructing
> remote block reader.
> java.io.IOException: java.lang.InterruptedException
> at org.apache.hadoop.ipc.Client.call(Client.java:1448)
> at org.apache.hadoop.ipc.Client.call(Client.java:1400)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy32.getServerDefaults(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getServerDefaults(ClientNamenodeProtocolTranslatorPB.java:268)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy33.getServerDefaults(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:1007)
> at
> org.apache.hadoop.hdfs.DFSClient.shouldEncryptData(DFSClient.java:2062)
> at
> org.apache.hadoop.hdfs.DFSClient.newDataEncryptionKey(DFSClient.java:2068)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:208)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:159)
> at
> org.apache.hadoop.hdfs.net.TcpPeerServer.peerFromSocketAndKey(TcpPeerServer.java:90)
> at
> org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3123)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:755)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:670)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:337)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
> at
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:848)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:407)
> at
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:311)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:228)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:885)
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.run(OrcInputFormat.java:771)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.InterruptedException
> at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
> at