GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/20625
[SPARK-23446][PYTHON] Explicitly check supported types in toPandas ## What changes were proposed in this pull request? This PR explicitly specifies the types we supported in `toPandas`. This was a hole. For example, we haven't finished the binary type support in Python side yet but now it allows as below: ```python spark.conf.set("spark.sql.execution.arrow.enabled", "false") df = spark.createDataFrame([[bytearray("a")]]) df.toPandas() spark.conf.set("spark.sql.execution.arrow.enabled", "true") df.toPandas() ``` ``` _1 0 [97] _1 0 a ``` This should be disallowed. I think the same things also apply to nested timestamps too. I also added some nicer message about `spark.sql.execution.arrow.enabled` in the error message. ## How was this patch tested? Manually tested and tests added in `python/pyspark/sql/tests.py`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark pandas_convertion_supported_type Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20625.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20625 ---- commit c79c6df7284b9717fe4e4c26090dcb51bf7712da Author: hyukjinkwon <gurwls223@...> Date: 2018-02-16T07:45:52Z Explicitly specify supported types in toPandas ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org