Hi,

I'd like to discuss the future of timestamp support in Spark, in particular
with respect of handling timezones in different SQL types.   In a nutshell:

* There are at least 3 different ways of handling the timestamp type across
timezone changes
* We'd like Spark to clearly distinguish the 3 types (it currently
implements 1 of them), in a way that is backwards compatible, and also
compliant with the SQL standard.
* We'll get agreement across Spark, Hive, and Impala.

Zoltan Ivanfi (Parquet PMC, also my coworker) has written up a detailed
doc, describing the problem in more detail, the state of various SQL
engines, and how we can get to a better state without breaking any current
use cases.  The proposal is good for Spark by itself.  We're also going to
the Hive & Impala communities with this proposal, as its better for
everyone if everything is compatible.

Note that this isn't proposing a specific implementation in Spark as yet,
just a description of the overall problem and our end goal.  We're going to
each community to get agreement on the overall direction.  Then each
community can figure out specifics as they see fit.  (I don't think there
are any technical hurdles with this approach eg. to decide whether this
would be even possible in Spark.)

Here's a link to the doc Zoltan has put together.  It is a bit long, but it
explains how such a seemingly simple concept has become such a mess and how
we can get to a better state.

https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#heading=h.dq3b1mwkrfky

Please review the proposal and let us know your opinions, concerns and
suggestions.

thanks,
Imran

Reply via email to