Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20531#discussion_r166650891 --- Diff: docs/sql-programming-guide.md --- @@ -1734,7 +1734,7 @@ For detailed usage, please see [`pyspark.sql.functions.pandas_udf`](api/python/p ### Supported SQL Types -Currently, all Spark SQL data types are supported by Arrow-based conversion except `MapType`, +Currently, all Spark SQL data types are supported by Arrow-based conversion except `BinaryType`, `MapType`, --- End diff -- I was under impression that we don't support this. Seems Arrow doesn't work consistently with what Spark does. I think it's actually related with https://github.com/apache/spark/pull/20507. I am careful to say this out but I believe the root cause is how to handle `str` in Python 2. Technically, it's bytes but named string. As you might already know, due to this confusion, `unicode` became `str` and `str` became `bytes` in Python 3. Spark handles this as `StringType` in general whereas seems Arrow deals with binaries. I think we shouldn't support this for now until we get the consistent behaviour.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org