Hyukjin Kwon created SPARK-43789:
------------------------------------

             Summary: Uses 'spark.sql.execution.arrow.maxRecordsPerBatch' in R 
createDataFrame with Arrow by default
                 Key: SPARK-43789
                 URL: https://issues.apache.org/jira/browse/SPARK-43789
             Project: Spark
          Issue Type: New Feature
          Components: SparkR
    Affects Versions: 3.5.0
            Reporter: Hyukjin Kwon


Now, createDataFrame uses `1` for numPartitions by default, which isn't 
realistic. Should use larger number for default partitions.

In PySpark, we chunk the input data by  
'spark.sql.execution.arrow.maxRecordsPerBatch' size. Should better follow that 
in SparkR. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to