[ 
https://issues.apache.org/jira/browse/SPARK-25652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Bryński updated SPARK-25652:
-----------------------------------
    Description: 
Hi,
I found strange behaviour of Spark when using datetime from night of changing 
date (in CET).

The data from MySQL is wrongly converted and as a result fold=1 is added.

Sample code

MySQL column has DATETIME type and value: "2017-10-29 02:01:44"

{code}
spark.read.jdbc(URL).select("time_column").collect()
[Row(start_time=datetime.datetime(2017, 10, 29, 2, 1, 44, fold=1))]
{code}

As a comparison same query done by sqlalchemy.

{code}
engine = create_engine(URL)
engine.execute("select time_column from table").fetchone()
(datetime.datetime(2017, 10, 29, 2, 1, 44),)
{code}

I'm using Python 3.6. Both MySQL server and server where I'm doing queries are 
in CET timezone.



  was:
Hi,
I found strange behaviour of Spark when using datetime from night of changing 
date (in CET).

df.collect()
{code}
[Row(start_time=datetime.datetime(2017, 10, 29, 2, 1, 44, fold=1))]
{code}
df.show()
{code}
+-------------------+
|         start_time|
+-------------------+
|2017-10-29 02:01:44|
+-------------------+
{code}
As you can see fold is added to Python part.

Python 3.6

Data came from MySQL database datetime column with value "2017-10-29 02:01:44"


> Wrong datetime conversion between Java and Python 
> --------------------------------------------------
>
>                 Key: SPARK-25652
>                 URL: https://issues.apache.org/jira/browse/SPARK-25652
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.0
>            Reporter: Maciej Bryński
>            Priority: Major
>
> Hi,
> I found strange behaviour of Spark when using datetime from night of changing 
> date (in CET).
> The data from MySQL is wrongly converted and as a result fold=1 is added.
> Sample code
> MySQL column has DATETIME type and value: "2017-10-29 02:01:44"
> {code}
> spark.read.jdbc(URL).select("time_column").collect()
> [Row(start_time=datetime.datetime(2017, 10, 29, 2, 1, 44, fold=1))]
> {code}
> As a comparison same query done by sqlalchemy.
> {code}
> engine = create_engine(URL)
> engine.execute("select time_column from table").fetchone()
> (datetime.datetime(2017, 10, 29, 2, 1, 44),)
> {code}
> I'm using Python 3.6. Both MySQL server and server where I'm doing queries 
> are in CET timezone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to