Thanks guys! It all makes sense now!

On Thu, Oct 15, 2009 at 2:42 PM, David Lerman <[email protected]> wrote:

> During the planning process, the PartitionPruner gets the list of all the
> partitions from metadata and evaluates the full expression against the
> partition definition's fields, including calling any UDFs, to determine
> whether it needs to include each partition.  If you turn up the logging
> level for "hive.ql.optimizer.ppr.PartitionPruner", you can see it walk
> through each partition in turn to decide whether to include it.
>
> On 10/15/09 5:24 PM, "Vijay" <[email protected]> wrote:
>
> > Thanks Ashish! This is the same approach I'm using as well and it seems
> to be
> > working very good
> >
> > One thing I wasn't sure at first but was later surprised was how Hive is
> able
> > to figure out which partitions to work on using WHERE clauses. What I
> mean by
> > that is if I do something like WHERE month(ds)=9, it is able to figure
> out
> > that it needs to just use the partitions 2009-09-01 to 2009-09-30. How
> does
> > the query engine know this? Does it evaluate partition column related
> > expressions locally?
> >
> > Thanks,
> > Vijay
> >
> > On Tue, Oct 13, 2009 at 2:25 PM, Ashish Thusoo <[email protected]>
> wrote:
> >> We store the the partitioning as
> >>
> >> YYYY-MM-DD
> >>
> >> in that format the string representation of the date has the same
> >> lexicographical ordering as the date itself. So if you have that as the
> >> format
> >> of the string in the ds column (hive does not have date functions yet),
> then
> >> the expressions of the kind
> >>
> >> ds >= '2009-08-15' and ds <= '2009-09-15'
> >>
> >> will pick up the right partitions.
> >>
> >> For doing counts over the month you can either extract the month from
> the
> >> date
> >> string using the substring(ds, 5, 2) udf in hive or you can use
> month(ds) and
> >> then put
> >> that in the group by clause of the query.
> >>
> >> Ashish
> >>
> >>
> >> From: Vijay [mailto:[email protected]]
> >> Sent: Monday, October 12, 2009 5:05 PM
> >> To: [email protected]
> >> Subject: Questions on date arithmetic/calculations
> >>
> >> Hi,
> >>
> >> I have some basic questions on how hive handles dates and date
> arithmetic. I
> >> apologize if this has already been addressed. Per most samples on this
> site
> >> and elsewhere, I can have an access log table defined with a partition
> scheme
> >> that looks like this: ds='09-08-09'. This is obviously pretty good to
> >> partition the data. However, how can this information be used later in
> >> queries? For example, if I want to select data for all dates between
> 08/15/09
> >> and 09/15/09, how would I do that? The partition column ds cannot be
> used
> >> with
> >>> = and similar operators right? Additionally, when is partitioned this
> way,
> >> how can I do counts on month, etc? Obviously all of these queries need
> to be
> >> expressed in a way hive can still take advantage of the partitioning
> scheme.
> >> I
> >> hope that makes sense.
> >>
> >> Thanks,
> >> Vijay
> >
>
>

Reply via email to