paleolimbot commented on PR #45459: URL: https://github.com/apache/arrow/pull/45459#issuecomment-2691485598
@emkornfield @pitrou @wgtmac Thank you for your reviews! I think I managed to get the spirit of all of your comments throughout the diff, although I am sure I missed something and obviously feel free to let me know what it was! I think the outstanding issues are: - I merged the changes in from the branch I had been using to write the test files, so as is, this PR is fully functional for arbitrary cases of `write_parquet(geopandas_data_frame.to_arrow())` or `write_parquet(duckdb_rel.to_arrow_table())` or `write_parquet(sedona.spark.dataframe_arrow(spark_df))` with the extension type registered. I made the JSON dependency optional and separated the code so that it's easy to remove if that's not something we want. I'm totally game to remove that...we can do enough without parsing JSON that all of the reading/writing code is tested and we can solve the full utility later on (e.g., through a canonical extension type). - I re-read the "crs" section of the final spec and remembered very late that the `projjson:some_file_metadata_field` is a *recommendation* and not a requirement. It's obviously much easier to just dump GeoArrow's `"crs": ...` to Parquet's `crs` and vice versa and it would be very easy to convince me not to implement the recommendation (which would allow the removal of the `GeoCrsContext` from the `ArrowWriterProperties` and the `KeyValueMetadata` reference from the node context thing in the other direction). Alternatively, I'm happy to remove all non-lon/lat CRS handling (which still allows all of the read/write code to be tested) and figure that out later. - I am wondering if the R CI failures are because I added a `shared_ptr` to the `ArrowWriterProperties`. Happy to work through that if we decide we still want it there! (Or feel free to add something to this list!) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
