[ https://issues.apache.org/jira/browse/SPARK-28881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-28881: ---------------------------------- Summary: toPandas with Arrow should not return a DataFrame when the result size exceeds `spark.driver.maxResultSize` (was: toPandas with Arrow returns an empty DataFrame when the result size exceeds `spark.driver.maxResultSize`) > toPandas with Arrow should not return a DataFrame when the result size > exceeds `spark.driver.maxResultSize` > ----------------------------------------------------------------------------------------------------------- > > Key: SPARK-28881 > URL: https://issues.apache.org/jira/browse/SPARK-28881 > Project: Spark > Issue Type: Test > Components: PySpark, SQL > Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3 > Reporter: Hyukjin Kwon > Priority: Major > > {code} > ./bin/pyspark --conf spark.driver.maxResultSize=1m > spark.conf.set("spark.sql.execution.arrow.enabled",True) > spark.range(10000000).toPandas() > {code} > The codes above returns an empty dataframe in Spark 2.4 but It should throw > an exception as below: > {code} > py4j.protocol.Py4JJavaError: An error occurred while calling > o31.collectToPython. > : org.apache.spark.SparkException: Job aborted due to stage failure: Total > size of serialized results of 1 tasks (3.0 MB) is bigger than > spark.driver.maxResultSize (1024.0 KB) > {code} > This is a regression between Spark 2.3 and 2.4. > This JIRA targets to add a regression test. > In Spark 2.4: > {code} > ./bin/pyspark --conf spark.driver.maxResultSize=1m > spark.conf.set("spark.sql.execution.arrow.enabled",True) > spark.range(10000000).toPandas() > {code} > {code} > Empty DataFrame > Columns: [id] > Index: [] > {code} > or it can return partial results: > {code} > ./bin/pyspark --conf spark.driver.maxResultSize=1m > spark.conf.set("spark.sql.execution.arrow.enabled", True) > spark.range(0, 330000, 1, 100).toPandas() > {code} > {code} > ... > 75897 75897 > 75898 75898 > 75899 75899 > [75900 rows x 1 columns] > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org