GitHub user squito opened a pull request:

    https://github.com/apache/spark/pull/19250

    [SPARK-12297] Table timezone correction for Timestamps

    ## What changes were proposed in this pull request?
    
    When reading and writing data, spark will adjust timestamp data based on 
the delta between the current session timezone and the table time zone 
(specified either by a persistent table property, or an option to the 
DataFrameReader / Writer).  This is particularly important for parquet data, so 
that it can be treated equivalently by other SQL engines (eg. Impala and Hive). 
 Furthermore, this is useful if the same data is processed by multiple clusters 
in different time zones, and "timestamp without time zone" semantics are 
desired.
    
    ## How was this patch tested?
    
    Unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/squito/spark timestamp_all_formats

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19250.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19250
    
----
commit 54f87c2c1e0ab0645fa5497553cf031f13e98c3b
Author: Imran Rashid <iras...@cloudera.com>
Date:   2017-08-28T19:52:15Z

    SPARK-12297.  Table timezones.

commit 53b9fbe0c6128ec11afdb46d3239c693129f6952
Author: Imran Rashid <iras...@cloudera.com>
Date:   2017-09-14T20:18:46Z

    All dataformats support timezone correction.  Move rules & tests to a
    more appropriate location.  Ensure rule works without hive support.
    Extra checks on when table timezones are set.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to