Andre Menck created SPARK-23290:
-----------------------------------

             Summary: inadvertent change in handling of DateType when 
converting to pandas dataframe
                 Key: SPARK-23290
                 URL: https://issues.apache.org/jira/browse/SPARK-23290
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.3.0
            Reporter: Andre Menck


In [this 
PR|https://github.com/apache/spark/pull/18664/files#diff-6fc344560230bf0ef711bb9b5573f1faR1968]
 there was a change in how `DateType` is being returned to users (line 1968 in 
dataframe.py). This can cause client code to fail, as in the following example 
from a python terminal:

{code:python}
>>> pdf = pd.DataFrame([['2015-01-01',1]], columns=['date', 'num'])
>>> pdf.dtypes
date    object
num      int64
dtype: object
>>> pdf['date'].apply(lambda d: dt.datetime.strptime(d, '%Y-%m-%d').date() )
0    2015-01-01
Name: date, dtype: object
>>> pdf = pd.DataFrame([['2015-01-01',1]], columns=['date', 'num'])
>>> pdf.dtypes
date    object
num      int64
dtype: object
>>> pdf['date'] = pd.to_datetime(pdf['date'])
>>> pdf.dtypes
date    datetime64[ns]
num              int64
dtype: object
>>> pdf['date'].apply(lambda d: dt.datetime.strptime(d, '%Y-%m-%d').date() )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File 
"/Users/amenck/anaconda2/lib/python2.7/site-packages/pandas/core/series.py", 
line 2355, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/src/inference.pyx", line 1574, in 
pandas._libs.lib.map_infer
  File "<stdin>", line 1, in <lambda>
TypeError: strptime() argument 1 must be string, not Timestamp
>>> 
{code}

Above we show both the old behavior (returning an "object" col) and the new 
behavior (returning a datetime column). Since there may be user code relying on 
the old behavior, I'd suggest reverting this specific part of this change. Also 
note that the NOTE on the docstring for the "_to_corrected_pandas_type" seems 
to be off, referring to the old behavior and not the current one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to