David Vogelbacher created SPARK-27778:
-----------------------------------------

             Summary: toPandas with arrow enabled fails for DF with no partition
                 Key: SPARK-27778
                 URL: https://issues.apache.org/jira/browse/SPARK-27778
             Project: Spark
          Issue Type: Bug
          Components: PySpark, SQL
    Affects Versions: 3.0.0
            Reporter: David Vogelbacher


Calling to pandas with {{spark.sql.execution.arrow.enabled: true}} fails for 
dataframes with no partitions. The error is a {{EOFError}}. With 
{{spark.sql.execution.arrow.enabled: false}} the conversion.

Repro (on current master branch):
{noformat}
>>> from pyspark.sql.types import *
>>> schema = StructType([StructField("field1", StringType(), True)])
>>> df = spark.createDataFrame(sc.emptyRDD(), schema)
>>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
>>> df.toPandas()
/Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py:2162: 
UserWarning: toPandas attempted Arrow optimization because 
'spark.sql.execution.arrow.enabled' is set to true, but has reached the error 
below and can not continue. Note that 
'spark.sql.execution.arrow.fallback.enabled' does not have an effect on 
failures in the middle of computation.

  warnings.warn(msg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py", line 
2143, in toPandas
    batches = self._collectAsArrow()
  File "/Users/dvogelbacher/git/spark/python/pyspark/sql/dataframe.py", line 
2205, in _collectAsArrow
    results = list(_load_from_socket(sock_info, ArrowCollectSerializer()))
  File "/Users/dvogelbacher/git/spark/python/pyspark/serializers.py", line 210, 
in load_stream
    num = read_int(stream)
  File "/Users/dvogelbacher/git/spark/python/pyspark/serializers.py", line 810, 
in read_int
    raise EOFError
EOFError
>>> spark.conf.set("spark.sql.execution.arrow.enabled", "false")
>>> df.toPandas()
Empty DataFrame
Columns: [field1]
Index: []
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to