[
https://issues.apache.org/jira/browse/HIVE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834489#comment-13834489
]
Teddy Choi commented on HIVE-5761:
----------------------------------
Eric,
I researched the history of Hive date data type.
1. DATE in ORC: HIVE-4055 already implemented it. It uses an integer variable
DateWritable#daysSinceEpoch to represent a date. I think there is a hard chance
to use the alternative representation, which I prefer.
1. Basic operations: We may need to use java.sql.Date every time. [~thejas] and
[~jdere] already suggested JodaTime library, which is significantly faster. But
there were negative opinions about additional dependencies in HIVE-3910.
1. Complex operations: Fortunately, they will benefit from
DateWritable#daysSinceEpoch representation.
1. Vectorized plan: I'm not sure now. I will run some tests.
The key point is, how to improve basic operations performance with
DateWritable#daysSinceEpoch. I found that org.joda.time.Chronology does not
create objects during repetitive calculations
(http://stackoverflow.com/questions/6465330/any-good-high-performance-java-library-that-works-with-timestamp).
It gives me an insight, but looks hard to implement.
I'll start with a basic implementation with java.sql.Date, then I will find
more ways to optimize it.
Teddy
> Implement vectorized support for the DATE data type
> ---------------------------------------------------
>
> Key: HIVE-5761
> URL: https://issues.apache.org/jira/browse/HIVE-5761
> Project: Hive
> Issue Type: Sub-task
> Reporter: Eric Hanson
>
> Add support to allow queries referencing DATE columns and expression results
> to run efficiently in vectorized mode. This should re-use the code for the
> the integer/timestamp types to the extent possible and beneficial. Include
> unit tests and end-to-end tests. Consider re-using or extending existing
> end-to-end tests for vectorized integer and/or timestamp operations.
--
This message was sent by Atlassian JIRA
(v6.1#6144)