[ https://issues.apache.org/jira/browse/SPARK-16394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382716#comment-15382716 ]
Martin Tapp commented on SPARK-16394: ------------------------------------- We found it also happens when you take a python datetime and use it to instantiate a Spark Dataframe timestamp. > Timestamp conversion error in pyspark.sql.Row because of timezones > ------------------------------------------------------------------ > > Key: SPARK-16394 > URL: https://issues.apache.org/jira/browse/SPARK-16394 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.6.1 > Reporter: Martin Tapp > Priority: Minor > > We use DataFrame.map to convert each row to a dictionary using Row.asDict(). > The problem occurs when a Timestamp column is converted. It seems the > Timestamp gets converted to a naive Python datetime. This causes processing > errors since all naive datetimes get adjusted to the process' timezone. For > instance, a Timestamp with a time of midnight see's it's time bounce based on > the local timezone (+/- x hours). > Current fix is to apply the pytz.utc timezone to each datetime instance. > Proposed solution is to make all datetime instances aware and use the > pytz.utc timezone. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org