Thanks guys! It all makes sense now! On Thu, Oct 15, 2009 at 2:42 PM, David Lerman <[email protected]> wrote:
> During the planning process, the PartitionPruner gets the list of all the > partitions from metadata and evaluates the full expression against the > partition definition's fields, including calling any UDFs, to determine > whether it needs to include each partition. If you turn up the logging > level for "hive.ql.optimizer.ppr.PartitionPruner", you can see it walk > through each partition in turn to decide whether to include it. > > On 10/15/09 5:24 PM, "Vijay" <[email protected]> wrote: > > > Thanks Ashish! This is the same approach I'm using as well and it seems > to be > > working very good > > > > One thing I wasn't sure at first but was later surprised was how Hive is > able > > to figure out which partitions to work on using WHERE clauses. What I > mean by > > that is if I do something like WHERE month(ds)=9, it is able to figure > out > > that it needs to just use the partitions 2009-09-01 to 2009-09-30. How > does > > the query engine know this? Does it evaluate partition column related > > expressions locally? > > > > Thanks, > > Vijay > > > > On Tue, Oct 13, 2009 at 2:25 PM, Ashish Thusoo <[email protected]> > wrote: > >> We store the the partitioning as > >> > >> YYYY-MM-DD > >> > >> in that format the string representation of the date has the same > >> lexicographical ordering as the date itself. So if you have that as the > >> format > >> of the string in the ds column (hive does not have date functions yet), > then > >> the expressions of the kind > >> > >> ds >= '2009-08-15' and ds <= '2009-09-15' > >> > >> will pick up the right partitions. > >> > >> For doing counts over the month you can either extract the month from > the > >> date > >> string using the substring(ds, 5, 2) udf in hive or you can use > month(ds) and > >> then put > >> that in the group by clause of the query. > >> > >> Ashish > >> > >> > >> From: Vijay [mailto:[email protected]] > >> Sent: Monday, October 12, 2009 5:05 PM > >> To: [email protected] > >> Subject: Questions on date arithmetic/calculations > >> > >> Hi, > >> > >> I have some basic questions on how hive handles dates and date > arithmetic. I > >> apologize if this has already been addressed. Per most samples on this > site > >> and elsewhere, I can have an access log table defined with a partition > scheme > >> that looks like this: ds='09-08-09'. This is obviously pretty good to > >> partition the data. However, how can this information be used later in > >> queries? For example, if I want to select data for all dates between > 08/15/09 > >> and 09/15/09, how would I do that? The partition column ds cannot be > used > >> with > >>> = and similar operators right? Additionally, when is partitioned this > way, > >> how can I do counts on month, etc? Obviously all of these queries need > to be > >> expressed in a way hive can still take advantage of the partitioning > scheme. > >> I > >> hope that makes sense. > >> > >> Thanks, > >> Vijay > > > >
