GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/20625

    [SPARK-23446][PYTHON] Explicitly check supported types in toPandas

    ## What changes were proposed in this pull request?
    
    This PR explicitly specifies the types we supported in `toPandas`. This was 
a hole. For example, we haven't finished the binary type support in Python side 
yet but now it allows as below:
    
    ```python
    spark.conf.set("spark.sql.execution.arrow.enabled", "false")
    df = spark.createDataFrame([[bytearray("a")]])
    df.toPandas()
    spark.conf.set("spark.sql.execution.arrow.enabled", "true")
    df.toPandas()
    ```
    
    ```
         _1
    0  [97]
      _1
    0  a
    ```
    
    This should be disallowed. I think the same things also apply to nested 
timestamps too.
    
    I also added some nicer message about `spark.sql.execution.arrow.enabled` 
in the error message.
    
    ## How was this patch tested?
    
    Manually tested and tests added in `python/pyspark/sql/tests.py`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark 
pandas_convertion_supported_type

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20625.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20625
    
----
commit c79c6df7284b9717fe4e4c26090dcb51bf7712da
Author: hyukjinkwon <gurwls223@...>
Date:   2018-02-16T07:45:52Z

    Explicitly specify supported types in toPandas

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to