paleolimbot commented on code in PR #8801:
URL: https://github.com/apache/arrow-rs/pull/8801#discussion_r2525584754


##########
parquet-geospatial/src/types.rs:
##########
@@ -0,0 +1,210 @@
+use std::sync::Arc;
+
+use arrow::array::{Array, ArrayRef, BinaryArray, BinaryViewArray, 
LargeBinaryArray, make_array};
+use arrow::buffer::NullBuffer;
+use arrow::error::Result;
+use arrow_schema::Field;
+use arrow_schema::{ArrowError, DataType, extension::ExtensionType};
+use serde::{Deserialize, Serialize};
+
+#[derive(Copy, Clone, Debug, Serialize, Deserialize)]
+pub enum Hint {
+    Geometry,
+    Geography,
+}
+
+#[derive(Clone, Debug, Default, Serialize, Deserialize)]
+pub struct Metadata {
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub crs: Option<String>, // TODO: explore when this is valid JSON to avoid 
double escaping
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub algorithm: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    type_hint: Option<Hint>,
+}

Review Comment:
   > If we want to support writing this case while preserving the (lack of) 
input metadata
   
   You're absolutely correct to spot this case! I am not personally worried 
about this one...I think you're safe to canonically export spherical edges as 
omitted if you'd like (this is what Arrow C++ will do today). Also if you're 
passionate about preserving this particular round trip, go for it!
   
   I may have missed a case, although I think that all of the metadata you're 
proposing here is correct for both the Parquet and GeoArrow specs (there are 
just multiple ways to represent some things and it's often useful to roundtrip 
things where possible?).
   
   Er, I suppose putting a `"type_hint"` key in GeoArrow extension metadata is 
a bit fishy although I don't think it will cause any existing parsers to error.



##########
parquet-geospatial/src/types.rs:
##########
@@ -0,0 +1,210 @@
+use std::sync::Arc;
+
+use arrow::array::{Array, ArrayRef, BinaryArray, BinaryViewArray, 
LargeBinaryArray, make_array};
+use arrow::buffer::NullBuffer;
+use arrow::error::Result;
+use arrow_schema::Field;
+use arrow_schema::{ArrowError, DataType, extension::ExtensionType};
+use serde::{Deserialize, Serialize};
+
+#[derive(Copy, Clone, Debug, Serialize, Deserialize)]
+pub enum Hint {
+    Geometry,
+    Geography,
+}
+
+#[derive(Clone, Debug, Default, Serialize, Deserialize)]
+pub struct Metadata {
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub crs: Option<String>, // TODO: explore when this is valid JSON to avoid 
double escaping
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub algorithm: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    type_hint: Option<Hint>,
+}
+
+impl Metadata {
+    pub fn new(crs: Option<String>, algorithm: Option<String>) -> Self {
+        Self {
+            crs,
+            algorithm,
+            type_hint: None,
+        }
+    }
+
+    pub fn with_type_hint(mut self, type_hint: Hint) -> Self {
+        self.type_hint = Some(type_hint);
+        self
+    }
+
+    pub fn set_type_hint(&mut self, type_hint: Hint) {
+        self.type_hint = Some(type_hint)
+    }
+
+    pub fn type_hint(&self) -> Option<Hint> {
+        if self.type_hint.is_some() {
+            return self.type_hint;
+        }
+
+        match &self.algorithm {
+            Some(s) if s.to_lowercase() == "planar" => Some(Hint::Geometry),
+            Some(_) => Some(Hint::Geography),
+            None => None,
+        }
+    }
+}
+
+#[derive(Debug, Default)]
+pub struct WkbType(Metadata);
+
+impl WkbType {
+    pub fn new_geometry(metadata: Option<Metadata>) -> Self {
+        Self(metadata.unwrap_or_default().with_type_hint(Hint::Geometry))
+    }
+
+    pub fn new_geography(metadata: Option<Metadata>) -> Self {
+        Self(metadata.unwrap_or_default().with_type_hint(Hint::Geography))
+    }
+}
+
+impl ExtensionType for WkbType {
+    const NAME: &'static str = "geoarrow.wkb";
+
+    type Metadata = Metadata;
+
+    fn metadata(&self) -> &Self::Metadata {
+        &self.0
+    }
+
+    fn serialize_metadata(&self) -> Option<String> {
+        serde_json::to_string(&self.0).ok()
+    }
+
+    fn deserialize_metadata(metadata: Option<&str>) -> Result<Self::Metadata> {
+        let Some(metadata) = metadata else {
+            return Ok(Self::Metadata::default());
+        };
+
+        serde_json::from_str(metadata).map_err(|e| 
ArrowError::JsonError(e.to_string()))
+    }
+
+    fn supports_data_type(&self, data_type: &arrow_schema::DataType) -> 
Result<()> {
+        match data_type {
+            DataType::Binary | DataType::LargeBinary | DataType::BinaryView => 
Ok(()),
+            dt => Err(ArrowError::InvalidArgumentError(format!(
+                "Geometry data type mismatch, expected one of Binary, 
LargeBinary, BinaryView. Found {dt}"
+            ))),
+        }
+    }
+
+    fn try_new(data_type: &arrow_schema::DataType, metadata: Self::Metadata) 
-> Result<Self> {
+        let wkb = Self(metadata);
+        wkb.supports_data_type(data_type)?;
+        Ok(wkb)
+    }
+}
+
+#[derive(Debug)]
+pub struct WkbArray {
+    inner: ArrayRef,
+    metadata: Metadata,
+}
+
+impl WkbArray {

Review Comment:
   (It seems reasonable from my end although I don't really know how the 
extension array/extension type/metadata is supposed to work from the arrow-rs 
end of things and Andrew is definitely the right person to answer that)



##########
parquet/src/geospatial/mod.rs:
##########
@@ -49,3 +49,198 @@
 pub mod accumulator;
 pub mod bounding_box;
 pub mod statistics;
+
+#[cfg(test)]
+mod tests {

Review Comment:
   Is it worth putting these tests in parquet/tests/geospatial.rs? There might 
be some helper functions there for some of this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to