On 07/10/2014 01:26 PM, Nong Li wrote:
On Wed, Jun 11, 2014 at 7:25 PM, Jacques Nadeau <[email protected]
<mailto:[email protected]>> wrote:
As far as truncated julian day versus unix epoch, I thought they
started at the same time which is why I suggested it. Upon further
looking, I realize they do not. As such, I guess the best option is
is unix epoch divided by 86400.
I don't think anyway feels too strongly about which one we pick but I
agree we should pick the same for both.
Ryan & Jacques: you guys seem to have a stronger opinion on this.
Jacques, I can't tell from your previous email
if we've got consensus now.
The pull request, #3, uses Unix epoch now. I think we have consensus.
Why is the maximum precision in microseconds? Both previous
proposals used nanoseconds instead. The gain seems to be that
timestamp_micro fits in an int64, but that means that the
time_micro type is only using 5 bits of the extra 4 bytes used
to store it.
One solution I'd like to consider is what Apache Phoenix does.
Phoenix uses a separate 4 bytes to store a nanosecond offset (20
bits). This would enable ignoring the nanoseconds in some cases,
like for most comparisons in filters. It would take no more
space than the time_micro type and would require another 4 bytes
for the timestamp equivalent, but you'd get nanosecond precision.
How do you propose adding those 4 bytes? I don't want to introduce
"compound single column types".
What if we added time_nano and timestamp_nano and used
fixed_len_byte_array as the underlying
storage type.
I'm proposing we replace the _micro types with _nano types. time_nano
will fit in an int64, but timestamp_nano will not. I propose we store
timestamp_nano as a 12-byte fixed, with the first 8 bytes used to encode
the time in milliseconds and the remaining 4 used to store the
nanosecond offset. Both values should use big-endian.
rb
--
Ryan Blue
Software Engineer
Cloudera, Inc.