nicolasazrak commented on a change in pull request #34509: URL: https://github.com/apache/spark/pull/34509#discussion_r758535773
########## File path: python/pyspark/sql/pandas/serializers.py ########## @@ -169,6 +169,8 @@ def create_array(s, t): elif is_categorical_dtype(s.dtype): # Note: This can be removed once minimum pyarrow version is >= 0.16.1 s = s.astype(s.dtypes.categories.dtype) + elif t is not None and pa.types.is_string(t): + s = s.astype(str) Review comment: Now that I think it again, adding support for the `StringArray` in arrow is the real solution, this change is just a workaround. I don't know much about the arrow internals or if they can add some metadata to support two different string types. If you feel this is better done in arrow upstream feel free to close the PR and I'll investigate it from the arrow side. Otherwise, we can leave this patch adding a comment and keep investigating in arrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org