Gerhard Fiedler created SPARK-12683:
---------------------------------------

             Summary: SQL timestamp is wrong when accessed as Python datetime
                 Key: SPARK-12683
                 URL: https://issues.apache.org/jira/browse/SPARK-12683
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 1.6.0, 1.5.2, 1.5.1
         Environment: Windows 7 Pro x64
Python 3.4.3
py4j 0.9
            Reporter: Gerhard Fiedler


When accessing SQL timestamp data through {{.show()}}, it looks correct, but 
when accessing it (as Python {{datetime}}) through {{.collect()}}, it is wrong.

{code}
from datetime import datetime
from pyspark import SparkContext
from pyspark.sql import SQLContext


if __name__ == "__main__":
    spark_context = SparkContext(appName='SparkBugTimestampHour')
    sql_context = SQLContext(spark_context)

    sql_text = """select cast('2100-09-09 12:11:10.09' as timestamp) as ts"""
    data_frame = sql_context.sql(sql_text)
    data_frame.show(truncate=False)

    # Result from .show() (as expected, looks correct):
    # +----------------------+
    # |ts                    |
    # +----------------------+
    # |2100-09-09 12:11:10.09|
    # +----------------------+

    rows = data_frame.collect()
    row = rows[0]
    ts = row[0]

    print('ts={ts}'.format(ts=ts))
    # Expected result from this print statement:
    # ts=2100-09-09 12:11:10.090000
    #
    # Actual, wrong result (note the hours being 18 instead of 12):
    # ts=2100-09-09 18:11:10.090000
    #
    # This error seems to be dependent on some characteristic of the system. We 
couldn't reproduce
    # this on all of our systems, but it is not clear what the differences are. 
One difference is
    # the processor: it failed on Intel Xeon E5-2687W v2.

    assert isinstance(ts, datetime)
    assert ts.year == 2100 and ts.month == 9 and ts.day == 9
    assert ts.minute == 11 and ts.second == 10 and ts.microsecond == 90000
    if ts.hour != 12:
        print('hour is not correct; should be 12, is actually 
{hour}'.format(hour=ts.hour))

    spark_context.stop()
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to