Ian Cook created SPARK-47466: -------------------------------- Summary: Add PySpark DataFrame method to return iterator of PyArrow RecordBatches Key: SPARK-47466 URL: https://issues.apache.org/jira/browse/SPARK-47466 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.5.1 Reporter: Ian Cook
As a follow-up to SPARK-47365: *toArrow()* is useful when the data is relatively small. For larger data, the best way to return the contents of a PySpark DataFrame in Arrow format is to return an iterator of [PyArrow RecordBatches|https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html]. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org