[ https://issues.apache.org/jira/browse/SPARK-27778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon reassigned SPARK-27778: ------------------------------------ Assignee: David Vogelbacher > toPandas with arrow enabled fails for DF with no partitions > ----------------------------------------------------------- > > Key: SPARK-27778 > URL: https://issues.apache.org/jira/browse/SPARK-27778 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL > Affects Versions: 3.0.0 > Reporter: David Vogelbacher > Assignee: David Vogelbacher > Priority: Major > > Calling to pandas with {{spark.sql.execution.arrow.enabled: true}} fails for > dataframes with no partitions. The error is a {{EOFError}}. With > {{spark.sql.execution.arrow.enabled: false}} the conversion. > Repro (on current master branch): > {noformat} > >>> from pyspark.sql.types import * > >>> schema = StructType([StructField("field1", StringType(), True)]) > >>> df = spark.createDataFrame(sc.emptyRDD(), schema) > >>> spark.conf.set("spark.sql.execution.arrow.enabled", "true") > >>> df.toPandas() > /Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py:2162: > UserWarning: toPandas attempted Arrow optimization because > 'spark.sql.execution.arrow.enabled' is set to true, but has reached the error > below and can not continue. Note that > 'spark.sql.execution.arrow.fallback.enabled' does not have an effect on > failures in the middle of computation. > warnings.warn(msg) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py", line > 2143, in toPandas > batches = self._collectAsArrow() > File "/Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py", line > 2205, in _collectAsArrow > results = list(_load_from_socket(sock_info, ArrowCollectSerializer())) > File "/Users/dvogelbacher/git/spark/python/pyspark/serializers.py", line > 210, in load_stream > num = read_int(stream) > File "/Users/dvogelbacher/git/spark/python/pyspark/serializers.py", line > 810, in read_int > raise EOFError > EOFError > >>> spark.conf.set("spark.sql.execution.arrow.enabled", "false") > >>> df.toPandas() > Empty DataFrame > Columns: [field1] > Index: [] > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org