wgtmac commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1803329291


##########
src/main/thrift/parquet.thrift:
##########
@@ -380,6 +410,38 @@ struct JsonType {
 struct BsonType {
 }
 
+/** Physical type and encoding for the geometry type */
+enum GeometryEncoding {
+  /**
+   * Allowed for physical type: BYTE_ARRAY.
+   *
+   * Well-known binary (WKB) representations of geometries.
+   */
+  WKB = 0;
+}
+
+/** Interpretation for edges of elements of a GEOMETRY type */
+enum Edges {
+  PLANAR = 0;
+  SPHERICAL = 1;
+}
+
+/**
+ * GEOMETRY logical type annotation (added in 2.11.0)
+ *
+ * GeometryEncoding and Edges are required. CRS is optional.
+ *
+ * Once CRS is set, it MUST be a key to an entry in the `key_value_metadata`
+ * field of `FileMetaData`.

Review Comment:
   > The need for embedding a full CRS description somewhere that is 
programatically accessible by a Parquet implementation is to ensure a 
producer's intent can be faithfully transported by the consumer.
   
   To achieve this, is it possible to reserve some crs values or at least some 
prefixes? For example, Iceberg may store `iceberg.xxx` to crs where `xxx` is an 
arbitrary crs identifier defined in its table metadata. Similarly, GeoParquet 
may set `geoparquet.xxx`  to crs and the key must exist in the Parquet file 
metadata and its associated value is the full CRS.
   
   This still causes fragmentation but it looks better than a strong 
enforcement. WDYT? @rdblue @jiayuasu @paleolimbot 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to