[ 
https://issues.apache.org/jira/browse/SPARK-19561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-19561.
--------------------------------
       Resolution: Fixed
         Assignee: Jason White
    Fix Version/s: 2.2.0
                   2.1.1

> Pyspark Dataframes don't allow timestamps near epoch
> ----------------------------------------------------
>
>                 Key: SPARK-19561
>                 URL: https://issues.apache.org/jira/browse/SPARK-19561
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 2.0.1, 2.1.0
>            Reporter: Jason White
>            Assignee: Jason White
>             Fix For: 2.1.1, 2.2.0
>
>
> Pyspark does not allow timestamps at or near the epoch to be created in a 
> DataFrame. Related issue: https://issues.apache.org/jira/browse/SPARK-19299
> TimestampType.toInternal converts a datetime object to a number representing 
> microseconds since the epoch. For all times more than 2148 seconds before or 
> after 1970-01-01T00:00:00+0000, this number is greater than 2^31 and Py4J 
> automatically serializes it as a long.
> However, for times within this range (~35 minutes before or after the epoch), 
> Py4J serializes it as an int. When creating the object on the Scala side, 
> ints are not recognized and the value goes to null. This leads to null values 
> in non-nullable fields, and corrupted Parquet files.
> The solution is trivial - force TimestampType.toInternal to always return a 
> long.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to