Arnaud Caruso created SPARK-13837: ------------------------------------- Summary: SQL Context function to_date() returns wrong date Key: SPARK-13837 URL: https://issues.apache.org/jira/browse/SPARK-13837 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.1 Environment: Python version: 2.7.6 (default, Mar 22 2014, 22:59:56) [GCC 4.8.2] Reporter: Arnaud Caruso
When using the SQL Context function to_date on a timestamp, it sometimes returns the wrong date. Here's how to reproduce the bug in Python: data = [[datetime.datetime(2015, 2, 20, 0, 0, 2)],[datetime.datetime(2015, 10, 9, 0, 0, 2)]] rddData = sc.parallelize(data) fields=[StructField('timestamp', TimestampType(), True)] schema=StructType(fields) data_table=sqlCtx.createDataFrame(data,schema) sqlCtx.registerDataFrameAsTable(data_table,"data") query="SELECT timestamp, TO_DATE(timestamp) FROM data " df=sqlCtx.sql(query) df.collect() Here are the results I get: [Row(timestamp=datetime.datetime(2015, 2, 20, 0, 0, 2), _c1=datetime.date(2015, 2, 20)), Row(timestamp=datetime.datetime(2015, 10, 9, 0, 0, 2), _c1=datetime.date(2015, 10, 8))] The first date is right but the second date is wrong, it returns October 8th instead of returning October 9th. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org