[ 
https://issues.apache.org/jira/browse/SPARK-47068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-47068.
---------------------------------

> Recover -1 and 0 case for spark.sql.execution.arrow.maxRecordsPerBatch
> ----------------------------------------------------------------------
>
>                 Key: SPARK-47068
>                 URL: https://issues.apache.org/jira/browse/SPARK-47068
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.4.1, 3.5.0, 4.0.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 3.5.2, 3.4.3, 4.0.0
>
>
> {code}
> import pandas as pd
> spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
> spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", 0)
> spark.conf.set("spark.sql.execution.arrow.pyspark.fallback.enabled", False)
> spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()
> spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", -1)
> spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()
> {code}
> {code}
> /.../spark/python/pyspark/sql/pandas/conversion.py:371: UserWarning: 
> createDataFrame attempted Arrow optimization because 
> 'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached 
> the error below and will not continue because automatic fallback with 
> 'spark.sql.execution.arrow.pyspark.fallback.enabled' has been set to false.
>   range() arg 3 must not be zero
>   warn(msg)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/.../spark/python/pyspark/sql/session.py", line 1483, in 
> createDataFrame
>     return super(SparkSession, self).createDataFrame(  # type: 
> ignore[call-overload]
>   File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 351, in 
> createDataFrame
>     return self._create_from_pandas_with_arrow(data, schema, timezone)
>   File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 633, in 
> _create_from_pandas_with_arrow
>     pdf_slices = (pdf.iloc[start : start + step] for start in range(0, 
> len(pdf), step))
> ValueError: range() arg 3 must not be zero
> {code}
> {code}
> Empty DataFrame
> Columns: [a]
> Index: []
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to