paleolimbot commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1772566650
##########
src/main/thrift/parquet.thrift:
##########
@@ -373,6 +505,78 @@ struct JsonType {
struct BsonType {
}
+/**
+ * Geometry logical type annotation (added in 2.11.0)
+ */
+struct GeometryType {
+ /**
+ * Physical type and encoding for the geometry type.
+ * Please refer to the definition of GeometryEncoding for more detail.
+ */
+ 1: required GeometryEncoding encoding;
+ /**
+ * Interpretation for edges of elements of a GEOMETRY logical type, i.e.
whether
+ * the interpolation between points along an edge represents a straight
cartesian
+ * line or the shortest line on the sphere.
+ * Please refer to the definition of Edges for more detail.
+ */
+ 2: required EdgeInterpolation edges;
+ /**
+ * Coordinate Reference System, i.e. mapping of how coordinates refer to
+ * precise locations on earth. Writers are not required to set this field.
+ * Once crs is set, crs_encoding field below MUST be set together.
+ * For example, "OGC:CRS84" can be set in the form of PROJJSON as below:
+ * {
+ * "$schema": "https://proj.org/schemas/v0.5/projjson.schema.json",
+ * "type": "GeographicCRS",
+ * "name": "WGS 84 longitude-latitude",
+ * "datum": {
+ * "type": "GeodeticReferenceFrame",
+ * "name": "World Geodetic System 1984",
+ * "ellipsoid": {
+ * "name": "WGS 84",
+ * "semi_major_axis": 6378137,
+ * "inverse_flattening": 298.257223563
+ * }
+ * },
+ * "coordinate_system": {
+ * "subtype": "ellipsoidal",
+ * "axis": [
+ * {
+ * "name": "Geodetic longitude",
+ * "abbreviation": "Lon",
+ * "direction": "east",
+ * "unit": "degree"
+ * },
+ * {
+ * "name": "Geodetic latitude",
+ * "abbreviation": "Lat",
+ * "direction": "north",
+ * "unit": "degree"
+ * }
+ * ]
+ * },
+ * "id": {
+ * "authority": "OGC",
+ * "code": "CRS84"
+ * }
+ * }
+ */
+ 3: optional string crs;
+ /**
+ * Encoding used in the above crs field. It MUST be set if crs field is set.
+ * Currently the only allowed value is "PROJJSON".
+ */
+ 4: optional string crs_encoding;
Review Comment:
The ability to include a parameterized CRS is absolutely essential for the
GEOMETRY type in Parquet to be useful: not all CRSes have been catalogued, and
many can't be because they're too specific (e.g., a CRS optimized for a small
locality or specific project, or the view of a satellite orbiting a planet) or
too old (e.g., one of my projects with the Canadian government digitizing
several decades of sea ice coverage where the first four decades were in a CRS
that had never been catalogued but could be expressed in PROJJSON).
The `crs_encoding` piece is to make the `crs` string unambiguous. I happen
to think this is an improvement over many existing systems that just provide a
string and force the reader to guess the intent; however, it is not strictly
necessary (e.g., we could just define the CRS as a string).
Iceberg has a different set of use cases to Parquet...Parquet is useful to
geospatial practitioners operating at a smaller scale that need to deal with
these issues and want to use Parquet to do so. An identifier-based format may
fit those use cases well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]