paleolimbot commented on code in PR #8801:
URL: https://github.com/apache/arrow-rs/pull/8801#discussion_r2505093123
##########
parquet-geospatial/src/crs.rs:
##########
@@ -0,0 +1,97 @@
+use std::{collections::HashMap, sync::Arc};
+
+use arrow_schema::{Schema, SchemaBuilder};
+use serde_json::{Value, json};
+
+#[derive(Debug)]
+pub enum Crs {
+ Projjson(serde_json::Value),
+ Srid(u64),
+ Other(String),
+}
+
+impl Crs {
+ // TODO: make fallible
+ fn try_from_parquet_str(crs: &str, metadata: &HashMap<String, String>) ->
Self {
+ let de: Value = serde_json::from_str(crs).unwrap();
+
+ // A CRS that does not exist or is empty defaults to 4326
+ // TODO: http link to parquet geospatial doc
+ let Some(crs) = de["crs"].as_str() else {
+ return Crs::Srid(4326);
+ };
Review Comment:
If you could guarantee that an "srid" value in either Parquet or GeoArrow
referred to an EPSG identifier, it would be safe to make that assumption;
however, neither make any guarantees about SRIDs and so writing a GeoArrow CRS
as `{"crs": "4326", "crs_type": "srid"}` actually won't get interpreted as
lon/lat in most cases, and there is no guarantee that a Parquet file with a crs
of `srid:4326` will be interpreted correctly (although it probably will be).
The canonical way to write "lon/lat" crses to Parquet is `None` (i.e., not
present); the canonical way to write "lon/lat" to GeoArrow is either `{"crs":
"OGC:CRS84", "crs_type": "authority_code"}` or the 4 KB PROJJSON string with
the full definition (although there are a few other ways it could possibly show
up in GeoArrow):
https://github.com/apache/sedona-db/blob/9cecb5b42e7ed2e5c234d61c24ea286ba3d58245/rust/sedona-schema/src/crs.rs#L318
Arrow C++ tries as ever hard as it possibly can to detect a GeoArrow CRS
that is lon/lat to ensure that the Parquet representation is `None`, and
converts a `None` Parquet crs to GeoArrow metadata of `{"crs": "OGC:CRS84",
"crs_type": "authority_code"}` because I didn't feel like explaining to
non-spatial reviewers why we needed 4 KB of JSON to solve that case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]