We store the the partitioning as

YYYY-MM-DD

in that format the string representation of the date has the same 
lexicographical ordering as the date itself. So if you have that as the format 
of the string in the ds column (hive does not have date functions yet), then 
the expressions of the kind

ds >= '2009-08-15' and ds <= '2009-09-15'

will pick up the right partitions.

For doing counts over the month you can either extract the month from the date 
string using the substring(ds, 5, 2) udf in hive or you can use month(ds) and 
then put
that in the group by clause of the query.

Ashish

________________________________
From: Vijay [mailto:[email protected]]
Sent: Monday, October 12, 2009 5:05 PM
To: [email protected]
Subject: Questions on date arithmetic/calculations

Hi,

I have some basic questions on how hive handles dates and date arithmetic. I 
apologize if this has already been addressed. Per most samples on this site and 
elsewhere, I can have an access log table defined with a partition scheme that 
looks like this: ds='09-08-09'. This is obviously pretty good to partition the 
data. However, how can this information be used later in queries? For example, 
if I want to select data for all dates between 08/15/09 and 09/15/09, how would 
I do that? The partition column ds cannot be used with >= and similar operators 
right? Additionally, when is partitioned this way, how can I do counts on 
month, etc? Obviously all of these queries need to be expressed in a way hive 
can still take advantage of the partitioning scheme. I hope that makes sense.

Thanks,
Vijay

Reply via email to