paleolimbot commented on code in PR #8801:
URL: https://github.com/apache/arrow-rs/pull/8801#discussion_r2505093123


##########
parquet-geospatial/src/crs.rs:
##########
@@ -0,0 +1,97 @@
+use std::{collections::HashMap, sync::Arc};
+
+use arrow_schema::{Schema, SchemaBuilder};
+use serde_json::{Value, json};
+
+#[derive(Debug)]
+pub enum Crs {
+    Projjson(serde_json::Value),
+    Srid(u64),
+    Other(String),
+}
+
+impl Crs {
+    // TODO: make fallible
+    fn try_from_parquet_str(crs: &str, metadata: &HashMap<String, String>) -> 
Self {
+        let de: Value = serde_json::from_str(crs).unwrap();
+
+        // A CRS that does not exist or is empty defaults to 4326
+        // TODO: http link to parquet geospatial doc
+        let Some(crs) = de["crs"].as_str() else {
+            return Crs::Srid(4326);
+        };

Review Comment:
   If you could guarantee that an "srid" value in either Parquet or GeoArrow 
referred to an EPSG identifier, it would be safe to make that assumption; 
however, neither make any guarantees about SRIDs and so writing a GeoArrow CRS 
as `{"crs": "4326", "crs_type": "srid"}` actually won't get interpreted as 
lon/lat in most cases, and there is no guarantee that a Parquet file with a crs 
of `srid:4326` will be interpreted correctly (although it probably will be).
   
   The canonical way to write "lon/lat" crses to Parquet is `None` (i.e., not 
present); the canonical way to write "lon/lat" to GeoArrow is either `{"crs": 
"OGC:CRS84", "crs_type": "authority_code"}` or the 4 KB PROJJSON string with 
the full definition (although there are a few other ways it could possibly show 
up in GeoArrow):
   
   
https://github.com/apache/sedona-db/blob/9cecb5b42e7ed2e5c234d61c24ea286ba3d58245/rust/sedona-schema/src/crs.rs#L318
   
   Arrow C++ tries as ever hard as it possibly can to detect a GeoArrow CRS 
that is lon/lat to ensure that the Parquet representation is `None`, and 
converts a `None` Parquet crs to GeoArrow metadata of `{"crs": "OGC:CRS84", 
"crs_type": "authority_code"}` because I didn't feel like explaining to 
non-spatial reviewers why we needed 4 KB of JSON to solve that case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to