paleolimbot opened a new issue, #7240:
URL: https://github.com/apache/arrow-rs/issues/7240

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   I'd like to be able to read and/or write Parquet files with the new GEOMETRY 
and GEOGRAPHY types!
   
   - Spec references: 
https://github.com/apache/parquet-format/blob/master/Geospatial.md + 
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L240-L261
   - C++ implementation PR: https://github.com/apache/arrow/pull/45459
   - Java implementation PR: https://github.com/apache/parquet-java/pull/2971
   - Test files: https://github.com/apache/parquet-testing/pull/70 (and a few 
bigger ones at https://github.com/geoarrow/geoarrow-data )
   
   **Describe the solution you'd like**
   
   Support for read and/or write (perhaps read first and then write).
   
   **Describe alternatives you've considered**
   
   **Additional context**
   
   I think the main issue is what Arrow type to read into. The Parquet types 
have type-level metadata (a coordinate reference system and edge interpolation 
for geography) which can be propagated via the `geoarrow.wkb` extension type ( 
https://github.com/geoarrow/geoarrow/blob/main/extension-types.md#extension-metadata
 ). The most complicated mapping scenario looks something like:
   
   Parquet: `GEOGRAPHY(crs=projjson:some_file_metadata_field, 
algorithm=spherical)` -> Arrow: `geoarrow.wkb` + `{"crs": {<the actual 
projjson>}, "edges": "spherical"}`
   
   (The fact that the Parquet spec "recommends" putting the actual PROJJSON 
into the file metadata is something I tried to discourage when negotiating the 
spec change but was not ultimately successful).
   
   I haven't looked at the existing type mapping code but I think I remember 
reading the recent `ExtensionType` change was followed up with the ability for 
field metadata to be inspected/generated on the way in/out of Parquet to ensure 
that metadata is propagated wherever possible.
   
   Right now GeoArrow extension types are listed as "community extension 
types", which I believe was a category made up just for us. It may be that 
moving/voting `geoarrow.wkb` to the "canonical extension type" category is a 
precursor to finalizing this implementation, which is definitely fair 🙂 .
   
   I'm happy to attempt this when I get a chance (unless @kylebarron is 
chomping at the bit to do it or has already done it!).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to