wgtmac commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1829427456
##########
LogicalTypes.md:
##########
@@ -767,6 +767,188 @@ optional group my_map (MAP_KEY_VALUE) {
}
```
+## Geospatial Types
+
+### GEOMETRY
+
+`GEOMETRY` is used for geometry features from [OGC – Simple feature
access][simple-feature-access].
+See [Geospatial Notes](#geospatial-notes).
+
+The type has three type parameters:
+- `encoding`: A required enum value for annonated physical type and encoding
+ for the `GEOMETRY` type. See [Geometry
Encoding](#geometry-encoding).
+- `edges`: A required enum value for interpretation for edges of elements of
the
+ `GEOMETRY` type, i.e. whether the interpolation between points along
+ an edge represents a straight cartesian line or the shortest line on
+ the sphere. See [Edges](#edges).
+- `crs`: An optional string value for CRS (coordinate reference system), which
+ is a mapping of how coordinates refer to precise locations on earth.
+ See [Coordinate Reference System](#coordinate-reference-system).
+
+The sort order used for `GEOMETRY` is undefined. When writing data, no min/max
+statistics should be saved for this type and if such non-compliant statistics
+are found during reading, they must be ignored. Instead,
[GeometryStatistics](#geometry-statistics)
+is introduced for `GEOMETRY` type.
+
+#### Geometry Encoding
+
+Physical type and encoding for the `GEOMETRY` type. Supported values:
+- `WKB`: `GEOMETRY` type with `WKB` encoding can only be used to annotate the
+ `BYTE_ARRAY` primitive type. See [WKB](#well-known-binary-wkb).
+
+Note that geometry encoding is required for `GEOMETRY` type. In order to
correctly
+interpret geometry data, writer implementations SHOULD always set this field,
and
+reader implementations SHOULD fail for an unknown geometry encoding value.
+
+##### Well-known binary (WKB)
+
+Well-known binary (WKB) representations of geometries, see [Geospatial
Notes](#geospatial-notes).
+
+To be clear, we follow the same definitions of GeoParquet for
[WKB][geoparquet-wkb]
+and [coordinate axis order][coordinate-axis-order]:
+- Geometries SHOULD be encoded as ISO WKB supporting XY, XYZ, XYM, XYZM.
Supported
+standard geometry types: Point, LineString, Polygon, MultiPoint,
MultiLineString,
+MultiPolygon, and GeometryCollection.
+- Coordinate axis order is always (x, y) where x is easting or longitude, and
+y is northing or latitude. This ordering explicitly overrides the axis order
+as specified in the CRS following the [GeoPackage
specification][geopackage-spec].
+
+This is the preferred encoding for maximum portability.
+
+[geoparquet-wkb]:
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L92
+[coordinate-axis-order]:
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L155
+[geopackage-spec]: https://www.geopackage.org/spec130/#gpb_spec
+
+#### Edges
+
+Interpretation for edges of elements of `GEOMETRY` type. In other words, it
+specifies how a point between two vertices should be interpolated in its XY
+dimensions. Supported values and corresponding interpolation approaches are:
+- `PLANAR`: a Cartesian line connecting the two vertices.
+- `SPHERICAL`: a shortest spherical arc between the longitude and latitude
+ represented by the two vertices.
+
+This value applies to all non-point geometry objects and is independent of the
+[Coordinate Reference System](#coordinate-reference-system).
+
+Because most systems currently assume planar edges and do not support spherical
+edges, `PLANAR` should be used as the default value.
+
+Note that edges is required for `GEOMETRY` type. In order to correctly
+interpret geometry data, writer implementations SHOULD always set this field,
+and reader implementations SHOULD fail for an unknown edges value.
+
+#### Coordinate Reference System
+
+CRS (coordinate reference system) is a mapping of how coordinates refer to
+precise locations on earth. A CRS is specified by a key-value entry in the
+`key_value_metadata` field of `FileMetaData` whose key is a short name of
+the CRS and value is the CRS representation. An additional entry in the
+`key_value_metadata` field with the suffix ".type" is required to describe
+the encoding of this CRS representation.
+
+For example, if a geometry column (e.g., "geom1") uses the CRS "OGC:CRS84", the
+writer may write two entries to `key_value_metadata` field of `FileMetaData` as
+below, and set the `crs` field of the `GEOMETRY` type to "geom1_crs":
+```
+ "geom1_crs": an UTF-8 encoded PROJJSON representation of OGC:CRS84
+ "geom1_crs.type": "PROJJSON"
+```
+
+The PROJJSON representation of OGC:CRS84 can be seen at [OGC:CRS84][ogc-crs84].
+Multiple geometry columns can refer to the same CRS metadata field
+(e.g., "geom1_crs") if they share the same CRS.
+
+[ogc-crs84]:
https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#ogccrs84-details
+
+#### Geometry Statistics
+
+`GeometryStatistics` is a struct to store geometry statistics of a column chunk
Review Comment:
Existing statistics require to encode geometry data into binary
representation, which is non-trivial. We might add more kinds of geometry
statistics so it worth a specific type.
##########
LogicalTypes.md:
##########
@@ -767,6 +767,188 @@ optional group my_map (MAP_KEY_VALUE) {
}
```
+## Geospatial Types
+
+### GEOMETRY
+
+`GEOMETRY` is used for geometry features from [OGC – Simple feature
access][simple-feature-access].
+See [Geospatial Notes](#geospatial-notes).
+
+The type has three type parameters:
+- `encoding`: A required enum value for annonated physical type and encoding
+ for the `GEOMETRY` type. See [Geometry
Encoding](#geometry-encoding).
+- `edges`: A required enum value for interpretation for edges of elements of
the
+ `GEOMETRY` type, i.e. whether the interpolation between points along
+ an edge represents a straight cartesian line or the shortest line on
+ the sphere. See [Edges](#edges).
+- `crs`: An optional string value for CRS (coordinate reference system), which
+ is a mapping of how coordinates refer to precise locations on earth.
+ See [Coordinate Reference System](#coordinate-reference-system).
+
+The sort order used for `GEOMETRY` is undefined. When writing data, no min/max
+statistics should be saved for this type and if such non-compliant statistics
+are found during reading, they must be ignored. Instead,
[GeometryStatistics](#geometry-statistics)
+is introduced for `GEOMETRY` type.
+
+#### Geometry Encoding
+
+Physical type and encoding for the `GEOMETRY` type. Supported values:
+- `WKB`: `GEOMETRY` type with `WKB` encoding can only be used to annotate the
+ `BYTE_ARRAY` primitive type. See [WKB](#well-known-binary-wkb).
+
+Note that geometry encoding is required for `GEOMETRY` type. In order to
correctly
+interpret geometry data, writer implementations SHOULD always set this field,
and
+reader implementations SHOULD fail for an unknown geometry encoding value.
+
+##### Well-known binary (WKB)
+
+Well-known binary (WKB) representations of geometries, see [Geospatial
Notes](#geospatial-notes).
+
+To be clear, we follow the same definitions of GeoParquet for
[WKB][geoparquet-wkb]
+and [coordinate axis order][coordinate-axis-order]:
+- Geometries SHOULD be encoded as ISO WKB supporting XY, XYZ, XYM, XYZM.
Supported
+standard geometry types: Point, LineString, Polygon, MultiPoint,
MultiLineString,
+MultiPolygon, and GeometryCollection.
+- Coordinate axis order is always (x, y) where x is easting or longitude, and
+y is northing or latitude. This ordering explicitly overrides the axis order
+as specified in the CRS following the [GeoPackage
specification][geopackage-spec].
+
+This is the preferred encoding for maximum portability.
+
+[geoparquet-wkb]:
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L92
+[coordinate-axis-order]:
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L155
+[geopackage-spec]: https://www.geopackage.org/spec130/#gpb_spec
+
+#### Edges
+
+Interpretation for edges of elements of `GEOMETRY` type. In other words, it
+specifies how a point between two vertices should be interpolated in its XY
+dimensions. Supported values and corresponding interpolation approaches are:
+- `PLANAR`: a Cartesian line connecting the two vertices.
+- `SPHERICAL`: a shortest spherical arc between the longitude and latitude
+ represented by the two vertices.
+
+This value applies to all non-point geometry objects and is independent of the
+[Coordinate Reference System](#coordinate-reference-system).
+
+Because most systems currently assume planar edges and do not support spherical
+edges, `PLANAR` should be used as the default value.
+
+Note that edges is required for `GEOMETRY` type. In order to correctly
+interpret geometry data, writer implementations SHOULD always set this field,
+and reader implementations SHOULD fail for an unknown edges value.
+
+#### Coordinate Reference System
+
+CRS (coordinate reference system) is a mapping of how coordinates refer to
+precise locations on earth. A CRS is specified by a key-value entry in the
+`key_value_metadata` field of `FileMetaData` whose key is a short name of
+the CRS and value is the CRS representation. An additional entry in the
+`key_value_metadata` field with the suffix ".type" is required to describe
+the encoding of this CRS representation.
+
+For example, if a geometry column (e.g., "geom1") uses the CRS "OGC:CRS84", the
+writer may write two entries to `key_value_metadata` field of `FileMetaData` as
+below, and set the `crs` field of the `GEOMETRY` type to "geom1_crs":
+```
+ "geom1_crs": an UTF-8 encoded PROJJSON representation of OGC:CRS84
+ "geom1_crs.type": "PROJJSON"
+```
+
+The PROJJSON representation of OGC:CRS84 can be seen at [OGC:CRS84][ogc-crs84].
+Multiple geometry columns can refer to the same CRS metadata field
+(e.g., "geom1_crs") if they share the same CRS.
+
+[ogc-crs84]:
https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#ogccrs84-details
+
+#### Geometry Statistics
+
+`GeometryStatistics` is a struct to store geometry statistics of a column chunk
Review Comment:
Existing statistics require to encode geometry data into binary
representation, which is non-trivial. We might add more kinds of geometry
statistics so it worths a specific type.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]