[
https://issues.apache.org/jira/browse/HIVE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837097#comment-13837097
]
Eric Hanson commented on HIVE-5761:
-----------------------------------
We should just use the number of days since the epoch as the representation of
DATE in a vector. This will allow you to re-use all the VectorExpressions to do
<, >, <=, >=, =, and !=, like LongColEqualLongScalar,
FilterLongColEqualLongScalar, and dozens of others, rather than implement new
ones. If you play tricks and try to cache data inside the vector elements, you
will have to re-implement all the comparison operations -- that is too much
work.
But using an external cache to accelerate the operations like getWeek,
getMonth, getYear, etc. is a good idea. You could implement a cache as a
separate data structure that is a member variable of the VectorExpression class
used to implement an operation like getYear. E.g. an array of about 8000
elements could contain the results for all date function translations for + or
- 11 years. You don't even need to hash, just use the day integer to compute
the cache array entry number with a direct formula. For outliers outside the
size of your cache array, you could just fall back on a slower path to do the
full computation. You could rely on the fact that almost all date values in the
user data will be between, say, today -18 years and today + 2 years.
> Implement vectorized support for the DATE data type
> ---------------------------------------------------
>
> Key: HIVE-5761
> URL: https://issues.apache.org/jira/browse/HIVE-5761
> Project: Hive
> Issue Type: Sub-task
> Reporter: Eric Hanson
> Assignee: Teddy Choi
>
> Add support to allow queries referencing DATE columns and expression results
> to run efficiently in vectorized mode. This should re-use the code for the
> the integer/timestamp types to the extent possible and beneficial. Include
> unit tests and end-to-end tests. Consider re-using or extending existing
> end-to-end tests for vectorized integer and/or timestamp operations.
--
This message was sent by Atlassian JIRA
(v6.1#6144)