Hyukjin Kwon created SPARK-43789:
------------------------------------
Summary: Uses 'spark.sql.execution.arrow.maxRecordsPerBatch' in R
createDataFrame with Arrow by default
Key: SPARK-43789
URL: https://issues.apache.org/jira/browse/SPARK-43789
Project: Spark
Issue Type: New Feature
Components: SparkR
Affects Versions: 3.5.0
Reporter: Hyukjin Kwon
Now, createDataFrame uses `1` for numPartitions by default, which isn't
realistic. Should use larger number for default partitions.
In PySpark, we chunk the input data by
'spark.sql.execution.arrow.maxRecordsPerBatch' size. Should better follow that
in SparkR.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]