Hi, I'd like to discuss the future of timestamp support in Spark, in particular with respect of handling timezones in different SQL types. In a nutshell:
* There are at least 3 different ways of handling the timestamp type across timezone changes * We'd like Spark to clearly distinguish the 3 types (it currently implements 1 of them), in a way that is backwards compatible, and also compliant with the SQL standard. * We'll get agreement across Spark, Hive, and Impala. Zoltan Ivanfi (Parquet PMC, also my coworker) has written up a detailed doc, describing the problem in more detail, the state of various SQL engines, and how we can get to a better state without breaking any current use cases. The proposal is good for Spark by itself. We're also going to the Hive & Impala communities with this proposal, as its better for everyone if everything is compatible. Note that this isn't proposing a specific implementation in Spark as yet, just a description of the overall problem and our end goal. We're going to each community to get agreement on the overall direction. Then each community can figure out specifics as they see fit. (I don't think there are any technical hurdles with this approach eg. to decide whether this would be even possible in Spark.) Here's a link to the doc Zoltan has put together. It is a bit long, but it explains how such a seemingly simple concept has become such a mess and how we can get to a better state. https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#heading=h.dq3b1mwkrfky Please review the proposal and let us know your opinions, concerns and suggestions. thanks, Imran