BlakeOrth commented on code in PR #8801:
URL: https://github.com/apache/arrow-rs/pull/8801#discussion_r2528484583
##########
parquet-geospatial/src/types.rs:
##########
@@ -0,0 +1,210 @@
+use std::sync::Arc;
+
+use arrow::array::{Array, ArrayRef, BinaryArray, BinaryViewArray,
LargeBinaryArray, make_array};
+use arrow::buffer::NullBuffer;
+use arrow::error::Result;
+use arrow_schema::Field;
+use arrow_schema::{ArrowError, DataType, extension::ExtensionType};
+use serde::{Deserialize, Serialize};
+
+#[derive(Copy, Clone, Debug, Serialize, Deserialize)]
+pub enum Hint {
+ Geometry,
+ Geography,
+}
+
+#[derive(Clone, Debug, Default, Serialize, Deserialize)]
+pub struct Metadata {
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub crs: Option<String>, // TODO: explore when this is valid JSON to avoid
double escaping
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub algorithm: Option<String>,
+ #[serde(skip_serializing_if = "Option::is_none")]
+ type_hint: Option<Hint>,
+}
Review Comment:
Right, the `type_hint` (regardless of whether or not this would be tolerated
by existing parsers) more or less makes this metadata neither Parquet nor
GeoArrow. I think the more philosophical question of "should this metadata be
Parquet or GeoArrow" is important here. I'll preface this by saying I don't
really feel too strongly about this and my primary goal is to make sure this is
usable for the community at large, however, I'll at least detail my thought
process behind it.
I think even without the `type_hint` the metadata we present here is already
not really GeoArrow metadata, in spite of it generally looking compatible.
Since cases like `{"crs": "projjson:my_crs_metadata_key"}` don't actually
communicate the CRS without user (or other library, likely `geoarrow-rs`)
intervention there's already inherently some additional inspection of this
metdata necessary to convert it to GeoArrow. If users of the metadata are going
to need to inspect and potentially convert it anyway, we may as well allow the
metadata to support all the cases the underlying parquet spec supports.
Now, all that being said, I think we're in agreement that the case this
supports isn't a huge concern. If it makes most sense from an ecosystem
compatibility perspective to discard the `type_hint` and accept that we
technically have an unsupported case I think that's likely the strongest
justification for how to move forward here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]