GitHub user squito opened a pull request: https://github.com/apache/spark/pull/19250
[SPARK-12297] Table timezone correction for Timestamps ## What changes were proposed in this pull request? When reading and writing data, spark will adjust timestamp data based on the delta between the current session timezone and the table time zone (specified either by a persistent table property, or an option to the DataFrameReader / Writer). This is particularly important for parquet data, so that it can be treated equivalently by other SQL engines (eg. Impala and Hive). Furthermore, this is useful if the same data is processed by multiple clusters in different time zones, and "timestamp without time zone" semantics are desired. ## How was this patch tested? Unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/squito/spark timestamp_all_formats Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19250.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19250 ---- commit 54f87c2c1e0ab0645fa5497553cf031f13e98c3b Author: Imran Rashid <iras...@cloudera.com> Date: 2017-08-28T19:52:15Z SPARK-12297. Table timezones. commit 53b9fbe0c6128ec11afdb46d3239c693129f6952 Author: Imran Rashid <iras...@cloudera.com> Date: 2017-09-14T20:18:46Z All dataformats support timezone correction. Move rules & tests to a more appropriate location. Ensure rule works without hive support. Extra checks on when table timezones are set. ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org