>From an Apache point of view, we really need to move this document and the
discussion to the Apache wiki and mailing lists.

Did you want to take a first pass at moving it to Hive's wiki?

.. Owen

On Tue, Dec 11, 2018 at 10:40 AM Zoltan Ivanfi <z...@cloudera.com> wrote:

> Hi Owen,
>
> Thanks, I think your email contains a great summary of the problems
> tackled in the proposal. I would like highlight two particular topics from
> the discussion that we are having in the comments (details can be read in
> the document
> <https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit>
> ):
>
>    - It seems that we have agreement on the desired semantics of the more
>    explicit SQL types. In particular, I was glad to hear that the TIMESTAMP
>    WITH LOCAL TIME ZONE type that is already implemented in Hive is supposed
>    to have Instant semantics. (In fact, it already does have Instant
>    semantics, but it also has additional time zone information that is unused
>    at this moment, and I wasn't sure whether that will be utilized, changing
>    the semantics, or whether the semantics will remain and the superflous time
>    zone data will be removed.)
>    - We are still discussing what is the best course of action to take
>    with the plain TIMESTAMP type, which behaved differently in different file
>    formats in Hive 2 and was made to behave the same way in a
>    compatibility-breaking manner in Hive 3. My take on this type is that it
>    has already been used to write huge amounts of data and for this reason we
>    should restore its Avro- and Parquet-specific incosistent behaviour
>    (possibly controlled by a feature flag), so that legacy data remains
>    readable and legacy workarounds remain functional. The new, more explicit
>    SQL types will provide a clear migration path away from the messy TIMESTAMP
>    type.
>
> All in all, I feel that we are converging towards a common goal and I have
> high hopes that the more explicit timestamp types will have much better
> interoperability and consistency across different Hadoop SQL engines.
>
> Thanks,
>
> Zoltan
>
>
> On Mon, Dec 10, 2018 at 7:54 PM Owen O'Malley <owen.omal...@gmail.com>
> wrote:
>
>> Thank you for starting this discussion. Clearly the Hive semantics on
>> timestamp are very messed up, but has been moving in the right direction of
>> becoming more SQL standard compliant. I'm pulling this discussion back to
>> the list rather than the personal GoogleDoc, which isn't very
>> collaborative.
>>
>> I like your breakdown of the semantics:
>>
>>    - Instant - point in time that will appear different depending on the
>>    reader time zone
>>    - LocalDateTime - consistent hour and minute regardless of the reader
>>    time zone.
>>    - OffsetDateTime - consistent hour and minute with the offset of the
>>    writer time zone
>>
>> The SQL standard has:
>>
>>    - Timestamp & Timestamp without time zone = LocalDateTime
>>    - Timestamp with time zone = OffsetDateTime
>>
>> Hive 2 had very confused semantics for timestamp:
>>
>>    - When storage was ORC, text, or RCFile with a text serde it was
>>    LocalDateTime
>>    - When storage was Avro, Parquet, or RCFile with a binary serde it
>>    was Instant
>>
>> Hive 3.1 has moved toward the SQL standard extended with Oracles'
>> timestamp with local time zone:
>>
>>    - Timestamp = LocalDateTime
>>    - Timestamp with local time zone = Instant
>>
>> This leaves us with a few problems:
>>
>>    - The Hive bindings to Parquet and Avro don't handle timestamps
>>    correctly.
>>    - ORC doesn't support timestamps with local time zone. I start
>>    working on it in ORC-189.
>>    - We don't have timestamp with time zone support.
>>
>> .. Owen
>>
>> On Thu, Dec 6, 2018 at 7:55 AM Marta Kuczora
>> <kuczo...@cloudera.com.invalid> wrote:
>>
>>> Hi Hive Community,
>>>
>>> I would like to share the following document on our "Consistent Timestamp
>>> types in Hadoop" plans for review.
>>>
>>> https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit
>>>
>>> With this plan we would like to get an agreement on consistent timestamp
>>> behavior on Hive, Spark and Impala and in order to achieve this, we are
>>> sharing this document with all three communities.
>>>
>>> Please review and comment, any feedback is much appreciated!
>>>
>>> Regards,
>>> Marta
>>>
>>

Reply via email to