rdblue commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1800210033


##########
src/main/thrift/parquet.thrift:
##########
@@ -380,6 +410,38 @@ struct JsonType {
 struct BsonType {
 }
 
+/** Physical type and encoding for the geometry type */
+enum GeometryEncoding {
+  /**
+   * Allowed for physical type: BYTE_ARRAY.
+   *
+   * Well-known binary (WKB) representations of geometries.
+   */
+  WKB = 0;
+}
+
+/** Interpretation for edges of elements of a GEOMETRY type */
+enum Edges {
+  PLANAR = 0;
+  SPHERICAL = 1;
+}
+
+/**
+ * GEOMETRY logical type annotation (added in 2.11.0)
+ *
+ * GeometryEncoding and Edges are required. CRS is optional.
+ *
+ * Once CRS is set, it MUST be a key to an entry in the `key_value_metadata`
+ * field of `FileMetaData`.

Review Comment:
   Why is it required that the CRS is embedded in file metadata? Isn't it clear 
if the CRS is a well-known one like `OGC:CRS84`? It seems to me that this 
resolution should be out of scope. Parquet can encourage that the CRS is 
documented in file metadata, but other systems could store the definition in a 
different location. For example, Iceberg could store this in a table property 
instead of in each data file.
   
   I would prefer to define this string property as a "Coordinate reference 
system identifier" and not specify how to exchange the PROJJSON or other format 
definition. I would also add a note that people are encouraged to store it in a 
location along with the file or table metadata.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to