I've been looking at the proposed date and time logical types and I have a few questions. Here are the proposed logical types:

Date and Time:
* date: truncated julian day (int32)
* time_milli: (milliseconds since midnight) int32
* time_micro: (microseconds since midnight) int64
* interval (proposed 12 byte, Nong to review)

Timestamps, Always stored as epoch time. (units since utc jan 1, 1970). Can also be annotated with ISO time zone string in footer.
* timestamp_milli int64 (milliseconds)
* timestamp_micro int64 (microseconds)

Could we use the same epoch for both date and the timestamps? I think it will get confusing for implementations if we use Julian epoch for dates and Unix epoch for timestamp. I like that the timestamp_milli proposed is familiar to most java developers because both java.util.Date and Joda's Instant are backed by it. Will Unix epoch work for date?

Why is the maximum precision in microseconds? Both previous proposals used nanoseconds instead. The gain seems to be that timestamp_micro fits in an int64, but that means that the time_micro type is only using 5 bits of the extra 4 bytes used to store it.

One solution I'd like to consider is what Apache Phoenix does. Phoenix uses a separate 4 bytes to store a nanosecond offset (20 bits). This would enable ignoring the nanoseconds in some cases, like for most comparisons in filters. It would take no more space than the time_micro type and would require another 4 bytes for the timestamp equivalent, but you'd get nanosecond precision.

Another win for the 4-byte nanosecond offset is that we are more likely to be able to use the same representation in the HBase type spec, since Phoenix already has data using this type and will almost certainly require nanosecond precision.

Thanks for putting together this proposal, it's great to see how close this is getting to finished.

rb

--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to