[ https://issues.apache.org/jira/browse/SPARK-47466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian Cook updated SPARK-47466: ----------------------------- Description: As a follow-up to SPARK-47365: {{toArrow()}} is useful when the data is relatively small. For larger data, the best way to return the contents of a PySpark DataFrame in Arrow format is to return an iterator of [PyArrow RecordBatches|https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html]. was: As a follow-up to SPARK-47365: *toArrow()* is useful when the data is relatively small. For larger data, the best way to return the contents of a PySpark DataFrame in Arrow format is to return an iterator of [PyArrow RecordBatches|https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html]. > Add PySpark DataFrame method to return iterator of PyArrow RecordBatches > ------------------------------------------------------------------------ > > Key: SPARK-47466 > URL: https://issues.apache.org/jira/browse/SPARK-47466 > Project: Spark > Issue Type: Improvement > Components: PySpark > Affects Versions: 3.5.1 > Reporter: Ian Cook > Priority: Major > > As a follow-up to SPARK-47365: > {{toArrow()}} is useful when the data is relatively small. For larger data, > the best way to return the contents of a PySpark DataFrame in Arrow format is > to return an iterator of [PyArrow > RecordBatches|https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html]. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org