Arnaud Caruso created SPARK-13837:
-------------------------------------

             Summary: SQL Context function to_date() returns wrong date
                 Key: SPARK-13837
                 URL: https://issues.apache.org/jira/browse/SPARK-13837
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.1
         Environment: Python version:
2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2]
            Reporter: Arnaud Caruso


When using the SQL Context function to_date on a timestamp, it sometimes 
returns the wrong date.

Here's how to reproduce the bug in Python:
data = [[datetime.datetime(2015, 2, 20, 0, 0, 2)],[datetime.datetime(2015, 10, 
9, 0, 0, 2)]]
rddData = sc.parallelize(data)
fields=[StructField('timestamp', TimestampType(), True)]
schema=StructType(fields)
data_table=sqlCtx.createDataFrame(data,schema)
sqlCtx.registerDataFrameAsTable(data_table,"data")
query="SELECT timestamp, TO_DATE(timestamp) FROM data "
df=sqlCtx.sql(query)
df.collect()

Here are the results I get:
[Row(timestamp=datetime.datetime(2015, 2, 20, 0, 0, 2), _c1=datetime.date(2015, 
2, 20)),
 Row(timestamp=datetime.datetime(2015, 10, 9, 0, 0, 2), _c1=datetime.date(2015, 
10, 8))]

The first date is right but the second date is wrong, it returns October 8th 
instead of returning October 9th.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to