Hyukjin Kwon created SPARK-43789: ------------------------------------ Summary: Uses 'spark.sql.execution.arrow.maxRecordsPerBatch' in R createDataFrame with Arrow by default Key: SPARK-43789 URL: https://issues.apache.org/jira/browse/SPARK-43789 Project: Spark Issue Type: New Feature Components: SparkR Affects Versions: 3.5.0 Reporter: Hyukjin Kwon
Now, createDataFrame uses `1` for numPartitions by default, which isn't realistic. Should use larger number for default partitions. In PySpark, we chunk the input data by 'spark.sql.execution.arrow.maxRecordsPerBatch' size. Should better follow that in SparkR. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org