[jira] [Created] (SPARK-47466) Add PySpark DataFrame method to return iterator of PyArrow RecordBatches

Ian Cook (Jira) Tue, 19 Mar 2024 10:01:50 -0700

Ian Cook created SPARK-47466:
--------------------------------

             Summary: Add PySpark DataFrame method to return iterator of 
PyArrow RecordBatches
                 Key: SPARK-47466
                 URL: https://issues.apache.org/jira/browse/SPARK-47466
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
    Affects Versions: 3.5.1
            Reporter: Ian Cook



As a follow-up to SPARK-47365:

*toArrow()* is useful when the data is relatively small. For larger data, the 
best way to return the contents of a PySpark DataFrame in Arrow format is to 
return an iterator of [PyArrow 
RecordBatches|https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html].
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47466) Add PySpark DataFrame method to return iterator of PyArrow RecordBatches

Reply via email to