Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r148782931 --- Diff: python/pyspark/sql/session.py --- @@ -512,9 +557,7 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr except Exception: has_pandas = False if has_pandas and isinstance(data, pandas.DataFrame): - if schema is None: - schema = [str(x) for x in data.columns] - data = [r.tolist() for r in data.to_records(index=False)] --- End diff -- but ... `numpy.datetime64` is not supported in `createDataFrame` IIUC: ```python import pandas as pd from datetime import datetime pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)]}) print [[v for v in r] for r in pdf.to_records(index=False)] spark.createDataFrame([[v for v in r] for r in pdf.to_records(index=False)]) ``` ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../spark/python/pyspark/sql/session.py", line 591, in createDataFrame rdd, schema = self._createFromLocal(map(prepare, data), schema) File "/.../spark/python/pyspark/sql/session.py", line 404, in _createFromLocal struct = self._inferSchemaFromList(data) File "/.../spark/python/pyspark/sql/session.py", line 336, in _inferSchemaFromList schema = reduce(_merge_type, map(_infer_schema, data)) File "/.../spark/python/pyspark/sql/types.py", line 1095, in _infer_schema fields = [StructField(k, _infer_type(v), True) for k, v in items] File "/.../spark/python/pyspark/sql/types.py", line 1072, in _infer_type raise TypeError("not supported type: %s" % type(obj)) TypeError: not supported type: <type 'numpy.datetime64'> ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org