>From an Apache point of view, we really need to move this document and the discussion to the Apache wiki and mailing lists.
Did you want to take a first pass at moving it to Hive's wiki? .. Owen On Tue, Dec 11, 2018 at 10:40 AM Zoltan Ivanfi <z...@cloudera.com> wrote: > Hi Owen, > > Thanks, I think your email contains a great summary of the problems > tackled in the proposal. I would like highlight two particular topics from > the discussion that we are having in the comments (details can be read in > the document > <https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit> > ): > > - It seems that we have agreement on the desired semantics of the more > explicit SQL types. In particular, I was glad to hear that the TIMESTAMP > WITH LOCAL TIME ZONE type that is already implemented in Hive is supposed > to have Instant semantics. (In fact, it already does have Instant > semantics, but it also has additional time zone information that is unused > at this moment, and I wasn't sure whether that will be utilized, changing > the semantics, or whether the semantics will remain and the superflous time > zone data will be removed.) > - We are still discussing what is the best course of action to take > with the plain TIMESTAMP type, which behaved differently in different file > formats in Hive 2 and was made to behave the same way in a > compatibility-breaking manner in Hive 3. My take on this type is that it > has already been used to write huge amounts of data and for this reason we > should restore its Avro- and Parquet-specific incosistent behaviour > (possibly controlled by a feature flag), so that legacy data remains > readable and legacy workarounds remain functional. The new, more explicit > SQL types will provide a clear migration path away from the messy TIMESTAMP > type. > > All in all, I feel that we are converging towards a common goal and I have > high hopes that the more explicit timestamp types will have much better > interoperability and consistency across different Hadoop SQL engines. > > Thanks, > > Zoltan > > > On Mon, Dec 10, 2018 at 7:54 PM Owen O'Malley <owen.omal...@gmail.com> > wrote: > >> Thank you for starting this discussion. Clearly the Hive semantics on >> timestamp are very messed up, but has been moving in the right direction of >> becoming more SQL standard compliant. I'm pulling this discussion back to >> the list rather than the personal GoogleDoc, which isn't very >> collaborative. >> >> I like your breakdown of the semantics: >> >> - Instant - point in time that will appear different depending on the >> reader time zone >> - LocalDateTime - consistent hour and minute regardless of the reader >> time zone. >> - OffsetDateTime - consistent hour and minute with the offset of the >> writer time zone >> >> The SQL standard has: >> >> - Timestamp & Timestamp without time zone = LocalDateTime >> - Timestamp with time zone = OffsetDateTime >> >> Hive 2 had very confused semantics for timestamp: >> >> - When storage was ORC, text, or RCFile with a text serde it was >> LocalDateTime >> - When storage was Avro, Parquet, or RCFile with a binary serde it >> was Instant >> >> Hive 3.1 has moved toward the SQL standard extended with Oracles' >> timestamp with local time zone: >> >> - Timestamp = LocalDateTime >> - Timestamp with local time zone = Instant >> >> This leaves us with a few problems: >> >> - The Hive bindings to Parquet and Avro don't handle timestamps >> correctly. >> - ORC doesn't support timestamps with local time zone. I start >> working on it in ORC-189. >> - We don't have timestamp with time zone support. >> >> .. Owen >> >> On Thu, Dec 6, 2018 at 7:55 AM Marta Kuczora >> <kuczo...@cloudera.com.invalid> wrote: >> >>> Hi Hive Community, >>> >>> I would like to share the following document on our "Consistent Timestamp >>> types in Hadoop" plans for review. >>> >>> https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit >>> >>> With this plan we would like to get an agreement on consistent timestamp >>> behavior on Hive, Spark and Impala and in order to achieve this, we are >>> sharing this document with all three communities. >>> >>> Please review and comment, any feedback is much appreciated! >>> >>> Regards, >>> Marta >>> >>