Thanks Ashish! This is the same approach I'm using as well and it seems to be working very good
One thing I wasn't sure at first but was later surprised was how Hive is able to figure out which partitions to work on using WHERE clauses. What I mean by that is if I do something like WHERE month(ds)=9, it is able to figure out that it needs to just use the partitions 2009-09-01 to 2009-09-30. How does the query engine know this? Does it evaluate partition column related expressions locally? Thanks, Vijay On Tue, Oct 13, 2009 at 2:25 PM, Ashish Thusoo <[email protected]> wrote: > We store the the partitioning as > > YYYY-MM-DD > > in that format the string representation of the date has the same > lexicographical ordering as the date itself. So if you have that as the > format of the string in the ds column (hive does not have date functions > yet), then the expressions of the kind > > ds >= '2009-08-15' and ds <= '2009-09-15' > > will pick up the right partitions. > > For doing counts over the month you can either extract the month from the > date string using the substring(ds, 5, 2) udf in hive or you can use > month(ds) and then put > that in the group by clause of the query. > > Ashish > > ------------------------------ > *From:* Vijay [mailto:[email protected]] > *Sent:* Monday, October 12, 2009 5:05 PM > *To:* [email protected] > *Subject:* Questions on date arithmetic/calculations > > Hi, > > I have some basic questions on how hive handles dates and date arithmetic. > I apologize if this has already been addressed. Per most samples on this > site and elsewhere, I can have an access log table defined with a partition > scheme that looks like this: ds='09-08-09'. This is obviously pretty good to > partition the data. However, how can this information be used later in > queries? For example, if I want to select data for all dates between > 08/15/09 and 09/15/09, how would I do that? The partition column ds cannot > be used with >= and similar operators right? Additionally, when is > partitioned this way, how can I do counts on month, etc? Obviously all of > these queries need to be expressed in a way hive can still take advantage of > the partitioning scheme. I hope that makes sense. > > Thanks, > Vijay >
