Dewey Dunnington created SEDONA-723:
---------------------------------------
Summary: Add Arrow write format
Key: SEDONA-723
URL: https://issues.apache.org/jira/browse/SEDONA-723
Project: Apache Sedona
Issue Type: Improvement
Reporter: Dewey Dunnington
In SEDONA-660, SEDONA-714, and SEDONA-717, we wired up the ArrowSerializer from
SparkConnect to accelerate transfer between the JVM and Python on the driver.
For queries whose results are arbitrarily large or unknown at the time of
issuing the query, this can result in out-of-memory and it would be helpful to
have an escape hatch. This is also a useful way for Sedona users to build
services on top of Sedona (e.g., by returning the URLs to the written Arrow
files as described in
https://arrow.apache.org/blog/2025/01/10/arrow-result-transfer/ ).
This should probably be a feature of Spark itself; however, I don't think the
existing conversion infrastructure is flexible enough to handle it. I'll put up
a draft PR exploring the idea to see if there is interest!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)