Ryan Blue created PARQUET-200:
---------------------------------

             Summary: Add nanosecond time and timestamp annotations
                 Key: PARQUET-200
                 URL: https://issues.apache.org/jira/browse/PARQUET-200
             Project: Parquet
          Issue Type: Improvement
          Components: parquet-format
            Reporter: Ryan Blue


When the date/time type annotations were added, we decided not to add 
precisions smaller than milliseconds because there wasn't a clear requirement. 
I think that the requirement is for nanosecond precision. The SQL spec requires 
at least microsecond, and many databases support nanosecond, including SQL 
engines on Hadoop: Hive, Phoenix, and Impala.

I propose adding the following type annotations:

* {{TIME_NANOS}}: annotates an int64 (8 bytes), represents the number of 
nanoseconds from midnight.
* {{TIMESTAMP_NANOS}}: annotates a 12-byte fixed, containing first an 8-byte 
number of milliseconds from unix epoch and, second, a 4-byte number of 
nanoseconds from the 8-byte time (nanoseconds from the last millisecond). Both 
values are little-endian.

The timestamp type allows object models that don't support nanosecond times (or 
don't need it for processing) to easily ignore the second value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to