Re: Doubt on Hive Partitioning.
Partition pruning works also with older Hive version, but you have to put the filter in the join statement and not only in the where statement > On 02 Aug 2016, at 09:53, Furcy Pinwrote: > > I'm using Hive 1.1 on MR and dynamic partition pruning does not seem to work. > > Since MR is deprecated in 2.0, I assume we should not expect any future perf > optimisation on this side. > > It has been implemented for Hive on Spark, though. > https://issues.apache.org/jira/browse/HIVE-9152 > > > > >> On Tue, Aug 2, 2016 at 3:45 AM, Qiuzhuang Lian >> wrote: >> Is this partition pruning fixed in MR too except for TEZ in newer hive >> version? >> >> Regards, >> Q >> >>> On Mon, Aug 1, 2016 at 8:48 PM, Jörn Franke wrote: >>> It happens in old hive version of the filter is only in the where clause >>> and NOT in the join clause. This should not happen in newer hive version. >>> You can check it by executing explain dependency query. >>> On 01 Aug 2016, at 11:07, Abhishek Dubey wrote: Hi All, I have a very big table t with billions of rows and it is partitioned on a column p. Column p has datatype text and values like ‘201601’, ‘201602’…upto ‘201612’. And, I am running a query like : Select columns from t where p=’201604’. My question is : Can there be a scenario/condition/probability that my query will do a complete table scan on t instead of only reading data for specified partition key. If yes, please put some light on those scenario. I’m asking this because someone told me that there is a probability that the query will ignore the partitioning and do a complete table scan to fetch output. Thanks & Regards, Abhishek Dubey >
Re: Doubt on Hive Partitioning.
I do not think so, but never tested it. > On 02 Aug 2016, at 03:45, Qiuzhuang Lianwrote: > > Is this partition pruning fixed in MR too except for TEZ in newer hive > version? > > Regards, > Q > >> On Mon, Aug 1, 2016 at 8:48 PM, Jörn Franke wrote: >> It happens in old hive version of the filter is only in the where clause and >> NOT in the join clause. This should not happen in newer hive version. You >> can check it by executing explain dependency query. >> >>> On 01 Aug 2016, at 11:07, Abhishek Dubey wrote: >>> >>> Hi All, >>> >>> >>> >>> I have a very big table t with billions of rows and it is partitioned on a >>> column p. Column p has datatype text and values like ‘201601’, >>> ‘201602’…upto ‘201612’. >>> >>> And, I am running a query like : Select columns from t where p=’201604’. >>> >>> >>> >>> My question is : Can there be a scenario/condition/probability that my >>> query will do a complete table scan on t instead of only reading data for >>> specified partition key. If yes, please put some light on those scenario. >>> >>> >>> >>> I’m asking this because someone told me that there is a probability that >>> the query will ignore the partitioning and do a complete table scan to >>> fetch output. >>> >>> >>> >>> >>> Thanks & Regards, >>> Abhishek Dubey >
Re: Doubt on Hive Partitioning.
I'm using Hive 1.1 on MR and dynamic partition pruning does not seem to work. Since MR is deprecated in 2.0, I assume we should not expect any future perf optimisation on this side. It has been implemented for Hive on Spark, though. https://issues.apache.org/jira/browse/HIVE-9152 On Tue, Aug 2, 2016 at 3:45 AM, Qiuzhuang Lianwrote: > Is this partition pruning fixed in MR too except for TEZ in newer hive > version? > > Regards, > Q > > On Mon, Aug 1, 2016 at 8:48 PM, Jörn Franke wrote: > >> It happens in old hive version of the filter is only in the where clause >> and NOT in the join clause. This should not happen in newer hive version. >> You can check it by executing explain dependency query. >> >> On 01 Aug 2016, at 11:07, Abhishek Dubey > > wrote: >> >> Hi All, >> >> >> >> I have a very big table *t* with billions of rows and it is partitioned >> on a column *p*. Column *p * has datatype text and values like ‘201601’, >> ‘201602’…upto ‘201612’. >> >> And, I am running a query like : *Select columns from t where >> p=’201604’.* >> >> >> >> My question is : Can there be a scenario/condition/probability that my >> query will do a complete table scan on *t* instead of only reading data >> for specified partition key. If yes, please put some light on those >> scenario. >> >> >> >> I’m asking this because someone told me that there is a probability that >> the query will ignore the partitioning and do a complete table scan to >> fetch output. >> >> >> >> *Thanks & Regards,* >> *Abhishek Dubey* >> >> >> >> >