Ryan Blue created PARQUET-200:
---------------------------------
Summary: Add nanosecond time and timestamp annotations
Key: PARQUET-200
URL: https://issues.apache.org/jira/browse/PARQUET-200
Project: Parquet
Issue Type: Improvement
Components: parquet-format
Reporter: Ryan Blue
When the date/time type annotations were added, we decided not to add
precisions smaller than milliseconds because there wasn't a clear requirement.
I think that the requirement is for nanosecond precision. The SQL spec requires
at least microsecond, and many databases support nanosecond, including SQL
engines on Hadoop: Hive, Phoenix, and Impala.
I propose adding the following type annotations:
* {{TIME_NANOS}}: annotates an int64 (8 bytes), represents the number of
nanoseconds from midnight.
* {{TIMESTAMP_NANOS}}: annotates a 12-byte fixed, containing first an 8-byte
number of milliseconds from unix epoch and, second, a 4-byte number of
nanoseconds from the 8-byte time (nanoseconds from the last millisecond). Both
values are little-endian.
The timestamp type allows object models that don't support nanosecond times (or
don't need it for processing) to easily ignore the second value.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)