Github user BryanCutler commented on the issue:
    After incorporating date and timestamp types for this, I had to refactor a 
little to use `_create_batch` from serializers to make Arrow batches from 
Columns even when the user doesn't specify the schema to be able to use the 
casts for these types. It doesn't seem to affect performance from the initial 
    I came across an issue when using pandas DataFrame with timestamps without 
Arrow.  Spark will read values as long and not datetime, so currently a test 
for this will fail
    In [1]: spark.conf.set("spark.sql.execution.arrow.enabled", "false")
    In [2]: import pandas as pd
       ...: from datetime import datetime
    In [3]: pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)]})
    In [4]: df = spark.createDataFrame(pdf)
    In [5]:
    |                 ts|
    In [6]: df.schema
    Out[6]: StructType(List(StructField(ts,LongType,true)))
    In [7]: pdf
    0 2017-10-31 01:01:01
    In [9]: pdf.dtypes
    ts    datetime64[ns]
    dtype: object
    @HyukjinKwon or @ueshin could you confirm you see the same? and do you 
consider this a bug?


To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to