[jira] [Created] (SPARK-43789) Uses 'spark.sql.execution.arrow.maxRecordsPerBatch' in R createDataFrame with Arrow by default

Hyukjin Kwon (Jira) Wed, 24 May 2023 21:45:04 -0700

Hyukjin Kwon created SPARK-43789:
------------------------------------

             Summary: Uses 'spark.sql.execution.arrow.maxRecordsPerBatch' in R 
createDataFrame with Arrow by default
                 Key: SPARK-43789
                 URL: https://issues.apache.org/jira/browse/SPARK-43789
             Project: Spark
          Issue Type: New Feature
          Components: SparkR
    Affects Versions: 3.5.0
            Reporter: Hyukjin Kwon



Now, createDataFrame uses `1` for numPartitions by default, which isn't 
realistic. Should use larger number for default partitions.

In PySpark, we chunk the input data by  
'spark.sql.execution.arrow.maxRecordsPerBatch' size. Should better follow that 
in SparkR. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-43789) Uses 'spark.sql.execution.arrow.maxRecordsPerBatch' in R createDataFrame with Arrow by default

Reply via email to