[jira] [Created] (SPARK-47068) Recover -1 and 0 case for spark.sql.execution.arrow.maxRecordsPerBatch

Hyukjin Kwon (Jira) Thu, 15 Feb 2024 17:17:58 -0800

Hyukjin Kwon created SPARK-47068:
------------------------------------

             Summary: Recover -1 and 0 case for 
spark.sql.execution.arrow.maxRecordsPerBatch
                 Key: SPARK-47068
                 URL: https://issues.apache.org/jira/browse/SPARK-47068
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.5.0, 3.4.1, 4.0.0
            Reporter: Hyukjin Kwon



{code}
import pandas as pd
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", 0)
spark.conf.set("spark.sql.execution.arrow.pyspark.fallback.enabled", False)
spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()

spark.conf.set("spark.sql.execution.arrow.maxRecordsPerBatch", -1)
spark.createDataFrame(pd.DataFrame({'a': [123]})).toPandas()
{code}

{code}
/.../spark/python/pyspark/sql/pandas/conversion.py:371: UserWarning: 
createDataFrame attempted Arrow optimization because 
'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached the 
error below and will not continue because automatic fallback with 
'spark.sql.execution.arrow.pyspark.fallback.enabled' has been set to false.
  range() arg 3 must not be zero
  warn(msg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../spark/python/pyspark/sql/session.py", line 1483, in createDataFrame
    return super(SparkSession, self).createDataFrame(  # type: 
ignore[call-overload]
  File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 351, in 
createDataFrame
    return self._create_from_pandas_with_arrow(data, schema, timezone)
  File "/.../spark/python/pyspark/sql/pandas/conversion.py", line 633, in 
_create_from_pandas_with_arrow
    pdf_slices = (pdf.iloc[start : start + step] for start in range(0, 
len(pdf), step))
ValueError: range() arg 3 must not be zero
{code}

{code}
Empty DataFrame
Columns: [a]
Index: []
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47068) Recover -1 and 0 case for spark.sql.execution.arrow.maxRecordsPerBatch

Reply via email to