paleolimbot opened a new issue, #1756:
URL: https://github.com/apache/sedona/issues/1756
I'm still getting to know how the input/output works for Spark/Sedona, but I
noticed that there's a `_collect_as_arrow()` method on data frames (I think
this is exposed as `.toArrow()` in pyspark 4.0.0) and I'm wondering if there's
an opportunity to implement something like `Adapter.to_geoarrow()` to provide
compatibility with tools that import/export it (e.g., geopandas, geoarrow-rs).
```python
import os
import geopandas
import pyspark
from sedona.spark import SedonaContext
config = SedonaContext.builder().getOrCreate()
sedona = SedonaContext.create(config)
gs = geopandas.GeoSeries.from_wkt(["POINT (1 2)"])
gdf = geopandas.GeoDataFrame({"geometry": gs})
sedona.createDataFrame(gdf)._collect_as_arrow()
#> [pyarrow.RecordBatch
#> geometry: binary
#> ----
#> geometry: [1200000001000000000000000000F03F0000000000000040]]
```
I am assuming that the binary here is the same binary that is being
serialized/deserialized in
https://github.com/apache/sedona/tree/52b6ae8e71601cdf36a6176198839bc3daf5547c/python/src
.
A simple case might be converting that serialization to WKB and exporting
`geoarrow.wkb`, which has the widest compatibility. With some information about
which geometry types are present, it would be possible to generate "native"
geoarrow (i.e., all coordinates together with separate buffers for part/ring
offsets).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]