Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/19459 After incorporating date and timestamp types for this, I had to refactor a little to use `_create_batch` from serializers to make Arrow batches from Columns even when the user doesn't specify the schema to be able to use the casts for these types. It doesn't seem to affect performance from the initial benchmark. I came across an issue when using pandas DataFrame with timestamps without Arrow. Spark will read values as long and not datetime, so currently a test for this will fail ``` In [1]: spark.conf.set("spark.sql.execution.arrow.enabled", "false") In [2]: import pandas as pd ...: from datetime import datetime ...: In [3]: pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)]}) In [4]: df = spark.createDataFrame(pdf) In [5]: df.show() +-------------------+ | ts| +-------------------+ |1509411661000000000| +-------------------+ In [6]: df.schema Out[6]: StructType(List(StructField(ts,LongType,true))) In [7]: pdf Out[7]: ts 0 2017-10-31 01:01:01 In [9]: pdf.dtypes Out[9]: ts datetime64[ns] dtype: object ``` @HyukjinKwon or @ueshin could you confirm you see the same? and do you consider this a bug?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org