Hi, Sorry if you receive this mail twice, it seems that my first attempt did not make it to the list for some reason.
I would like to start a discussion about SPARK-18350 <https://issues.apache.org/jira/browse/SPARK-18350> before it gets released because it seems to be going in a different direction than what other SQL engines of the Hadoop stack do. ANSI SQL defines the TIMESTAMP type (also known as TIMESTAMP WITHOUT TIME ZONE) to have timezone-agnostic semantics - basically a type that expresses readings from calendars and clocks and is unaffected by time zone. In the Hadoop stack, Impala has always worked like this and recently Presto also took steps <https://github.com/prestodb/presto/issues/7122> to become standards compliant. (Presto's design doc <https://docs.google.com/document/d/1UUDktZDx8fGwHZV4VyaEDQURorFbbg6ioeZ5KMHwoCk/edit> also contains a great summary of the different semantics.) Hive has a timezone-agnostic TIMESTAMP type as well (except for Parquet, a major source of incompatibility that is already being addressed <https://issues.apache.org/jira/browse/HIVE-12767>). A TIMESTAMP in SparkSQL, however, has UTC-normalized local time semantics (except for textfile), which is generally the semantics of the TIMESTAMP WITH TIME ZONE type. Given that timezone-agnostic TIMESTAMP semantics provide standards compliance and consistency with most SQL engines, I was wondering whether SparkSQL should also consider it in order to become ANSI SQL compliant and interoperable with other SQL engines of the Hadoop stack. Should SparkSQL adapt this semantics in the future, SPARK-18350 <https://issues.apache.org/jira/browse/SPARK-18350> may turn out to be a source of problems. Please correct me if I'm wrong, but this change seems to explicitly assign TIMESTAMP WITH TIME ZONE semantics to the TIMESTAMP type. I think SPARK-18350 would be a great feature for a separate TIMESTAMP WITH TIME ZONE type, but the plain unqualified TIMESTAMP type would be better becoming timezone-agnostic instead of gaining further timezone-aware capabilities. (Of course becoming timezone-agnostic would be a behavior change, so it must be optional and configurable by the user, as in Presto.) I would like to hear your opinions about this concern and about TIMESTAMP semantics in general. Does the community agree that a standards-compliant and interoperable TIMESTAMP type is desired? Do you perceive SPARK-18350 as a potential problem in achieving this or do I misunderstand the effects of this change? Thanks, Zoltan --- List of links in case in-line links do not work: - SPARK-18350: https://issues.apache.org/jira/browse/SPARK-18350 - Presto's change: https://github.com/prestodb/presto/issues/7122 - Presto's design doc: https://docs.google.com/document/d/1UUDktZDx8fGwHZV4VyaEDQURorFbbg6ioeZ5KMHwoCk/edit