Re: Doubt on Hive Partitioning.

2016-08-02 Thread Jörn Franke
Partition pruning works also with older Hive version, but you have to put the 
filter in the join statement and not only in the where statement 

> On 02 Aug 2016, at 09:53, Furcy Pin  wrote:
> 
> I'm using Hive 1.1 on MR and dynamic partition pruning does not seem to work.
> 
> Since MR is deprecated in 2.0, I assume we should not expect any future perf 
> optimisation on this side.
> 
> It has been implemented for Hive on Spark, though.
> https://issues.apache.org/jira/browse/HIVE-9152
> 
> 
> 
> 
>> On Tue, Aug 2, 2016 at 3:45 AM, Qiuzhuang Lian  
>> wrote:
>> Is this partition pruning fixed in MR too except for TEZ in newer hive 
>> version?
>> 
>> Regards,
>> Q
>> 
>>> On Mon, Aug 1, 2016 at 8:48 PM, Jörn Franke  wrote:
>>> It happens in old hive version of the filter is only in the where clause 
>>> and NOT in the join clause. This should not happen in newer hive version. 
>>> You can check it by executing explain dependency query. 
>>> 
 On 01 Aug 2016, at 11:07, Abhishek Dubey  
 wrote:
 
 Hi All,
 
  
 
 I have a very big table t with billions of rows and it is partitioned on a 
 column p. Column p  has datatype text and values like ‘201601’, 
 ‘201602’…upto ‘201612’.
 
 And, I am running a query like : Select columns from t where p=’201604’.
 
  
 
 My question is : Can there be a scenario/condition/probability that my 
 query will do a complete table scan on t instead of only reading data for 
 specified partition key. If yes, please put some light on those scenario.
 
  
 
 I’m asking this because someone told me that there is a probability that 
 the query will ignore the partitioning and do a complete table scan to 
 fetch output.
 
  
 
 
 Thanks & Regards,
 Abhishek Dubey
> 


Re: Doubt on Hive Partitioning.

2016-08-02 Thread Jörn Franke
I do not think so, but never tested it.

> On 02 Aug 2016, at 03:45, Qiuzhuang Lian  wrote:
> 
> Is this partition pruning fixed in MR too except for TEZ in newer hive 
> version?
> 
> Regards,
> Q
> 
>> On Mon, Aug 1, 2016 at 8:48 PM, Jörn Franke  wrote:
>> It happens in old hive version of the filter is only in the where clause and 
>> NOT in the join clause. This should not happen in newer hive version. You 
>> can check it by executing explain dependency query. 
>> 
>>> On 01 Aug 2016, at 11:07, Abhishek Dubey  wrote:
>>> 
>>> Hi All,
>>> 
>>>  
>>> 
>>> I have a very big table t with billions of rows and it is partitioned on a 
>>> column p. Column p  has datatype text and values like ‘201601’, 
>>> ‘201602’…upto ‘201612’.
>>> 
>>> And, I am running a query like : Select columns from t where p=’201604’.
>>> 
>>>  
>>> 
>>> My question is : Can there be a scenario/condition/probability that my 
>>> query will do a complete table scan on t instead of only reading data for 
>>> specified partition key. If yes, please put some light on those scenario.
>>> 
>>>  
>>> 
>>> I’m asking this because someone told me that there is a probability that 
>>> the query will ignore the partitioning and do a complete table scan to 
>>> fetch output.
>>> 
>>>  
>>> 
>>> 
>>> Thanks & Regards,
>>> Abhishek Dubey
> 


Re: Doubt on Hive Partitioning.

2016-08-02 Thread Furcy Pin
I'm using Hive 1.1 on MR and dynamic partition pruning does not seem to
work.

Since MR is deprecated in 2.0, I assume we should not expect any future
perf optimisation on this side.

It has been implemented for Hive on Spark, though.
https://issues.apache.org/jira/browse/HIVE-9152




On Tue, Aug 2, 2016 at 3:45 AM, Qiuzhuang Lian 
wrote:

> Is this partition pruning fixed in MR too except for TEZ in newer hive
> version?
>
> Regards,
> Q
>
> On Mon, Aug 1, 2016 at 8:48 PM, Jörn Franke  wrote:
>
>> It happens in old hive version of the filter is only in the where clause
>> and NOT in the join clause. This should not happen in newer hive version.
>> You can check it by executing explain dependency query.
>>
>> On 01 Aug 2016, at 11:07, Abhishek Dubey > > wrote:
>>
>> Hi All,
>>
>>
>>
>> I have a very big table *t* with billions of rows and it is partitioned
>> on a column *p*. Column *p * has datatype text and values like ‘201601’,
>> ‘201602’…upto ‘201612’.
>>
>> And, I am running a query like : *Select columns from t where
>> p=’201604’.*
>>
>>
>>
>> My question is : Can there be a scenario/condition/probability that my
>> query will do a complete table scan on *t* instead of only reading data
>> for specified partition key. If yes, please put some light on those
>> scenario.
>>
>>
>>
>> I’m asking this because someone told me that there is a probability that
>> the query will ignore the partitioning and do a complete table scan to
>> fetch output.
>>
>>
>>
>> *Thanks & Regards,*
>> *Abhishek Dubey*
>>
>>
>>
>>
>