Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r145865969 --- Diff: python/pyspark/sql/session.py --- @@ -510,6 +578,12 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr except Exception: has_pandas = False if has_pandas and isinstance(data, pandas.DataFrame): + if self.conf.get("spark.sql.execution.arrow.enabled", "false").lower() == "true" \ + and len(data) > 0: + df = self._createFromPandasWithArrow(data, schema) --- End diff -- As of https://github.com/apache/spark/pull/19459#issuecomment-337674952, `schema` from `_parse_datatype_string` could be not a `StructType`: https://github.com/apache/spark/blob/bfc7e1fe1ad5f9777126f2941e29bbe51ea5da7c/python/pyspark/sql/tests.py#L1325 although I don't think we have supported this case with `pd.DataFrame` as `int` case resembles `Dataset` with primitive types, up to my knowledge: ``` spark.createDataFrame(["a", "b"], "string").show() +-----+ |value| +-----+ | a| | b| +-----+ ``` For `pd.DataFrame` case, looks we always have a list of list. https://github.com/apache/spark/blob/d492cc5a21cd67b3999b85d97f5c41c3734b1ba3/python/pyspark/sql/session.py#L515 So, I think we should only support list of strings maybe with a proper exception for `int` case. Of course, this case should work: ``` >>> spark.createDataFrame(pd.DataFrame([1]), "struct<a: int>").show() +---+ | a| +---+ | 1| +---+ ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org