[I] Add GeoArrow IO to Sedona/Python [sedona]

via GitHub Tue, 14 Jan 2025 10:53:29 -0800


paleolimbot opened a new issue, #1756:
URL: https://github.com/apache/sedona/issues/1756


   I'm still getting to know how the input/output works for Spark/Sedona, but I 
noticed that there's a `_collect_as_arrow()` method on data frames (I think 
this is exposed as `.toArrow()` in pyspark 4.0.0) and I'm wondering if there's 
an opportunity to implement something like `Adapter.to_geoarrow()` to provide 
compatibility with tools that import/export it (e.g., geopandas, geoarrow-rs).
   
   ```python
   import os
   import geopandas
   import pyspark
   from sedona.spark import SedonaContext
   config = SedonaContext.builder().getOrCreate()
   sedona = SedonaContext.create(config)
   
   gs = geopandas.GeoSeries.from_wkt(["POINT (1 2)"])
   gdf = geopandas.GeoDataFrame({"geometry": gs})
   
   sedona.createDataFrame(gdf)._collect_as_arrow()
   #> [pyarrow.RecordBatch
   #>  geometry: binary
   #>  ----
   #>  geometry: [1200000001000000000000000000F03F0000000000000040]]
   ```
   
   I am assuming that the binary here is the same binary that is being 
serialized/deserialized in 
https://github.com/apache/sedona/tree/52b6ae8e71601cdf36a6176198839bc3daf5547c/python/src
 .
   
   A simple case might be converting that serialization to WKB and exporting 
`geoarrow.wkb`, which has the widest compatibility. With some information about 
which geometry types are present, it would be possible to generate "native" 
geoarrow (i.e., all coordinates together with separate buffers for part/ring 
offsets).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Add GeoArrow IO to Sedona/Python [sedona]

Reply via email to