Andre Menck created SPARK-23290: ----------------------------------- Summary: inadvertent change in handling of DateType when converting to pandas dataframe Key: SPARK-23290 URL: https://issues.apache.org/jira/browse/SPARK-23290 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.3.0 Reporter: Andre Menck
In [this PR|https://github.com/apache/spark/pull/18664/files#diff-6fc344560230bf0ef711bb9b5573f1faR1968] there was a change in how `DateType` is being returned to users (line 1968 in dataframe.py). This can cause client code to fail, as in the following example from a python terminal: {code:python} >>> pdf = pd.DataFrame([['2015-01-01',1]], columns=['date', 'num']) >>> pdf.dtypes date object num int64 dtype: object >>> pdf['date'].apply(lambda d: dt.datetime.strptime(d, '%Y-%m-%d').date() ) 0 2015-01-01 Name: date, dtype: object >>> pdf = pd.DataFrame([['2015-01-01',1]], columns=['date', 'num']) >>> pdf.dtypes date object num int64 dtype: object >>> pdf['date'] = pd.to_datetime(pdf['date']) >>> pdf.dtypes date datetime64[ns] num int64 dtype: object >>> pdf['date'].apply(lambda d: dt.datetime.strptime(d, '%Y-%m-%d').date() ) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/amenck/anaconda2/lib/python2.7/site-packages/pandas/core/series.py", line 2355, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "pandas/_libs/src/inference.pyx", line 1574, in pandas._libs.lib.map_infer File "<stdin>", line 1, in <lambda> TypeError: strptime() argument 1 must be string, not Timestamp >>> {code} Above we show both the old behavior (returning an "object" col) and the new behavior (returning a datetime column). Since there may be user code relying on the old behavior, I'd suggest reverting this specific part of this change. Also note that the NOTE on the docstring for the "_to_corrected_pandas_type" seems to be off, referring to the old behavior and not the current one. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org