wgtmac commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1829431953
##########
LogicalTypes.md:
##########
@@ -767,6 +767,188 @@ optional group my_map (MAP_KEY_VALUE) {
}
```
+## Geospatial Types
+
+### GEOMETRY
+
+`GEOMETRY` is used for geometry features from [OGC – Simple feature
access][simple-feature-access].
+See [Geospatial Notes](#geospatial-notes).
+
+The type has three type parameters:
+- `encoding`: A required enum value for annonated physical type and encoding
+ for the `GEOMETRY` type. See [Geometry
Encoding](#geometry-encoding).
+- `edges`: A required enum value for interpretation for edges of elements of
the
+ `GEOMETRY` type, i.e. whether the interpolation between points along
+ an edge represents a straight cartesian line or the shortest line on
+ the sphere. See [Edges](#edges).
+- `crs`: An optional string value for CRS (coordinate reference system), which
+ is a mapping of how coordinates refer to precise locations on earth.
+ See [Coordinate Reference System](#coordinate-reference-system).
+
+The sort order used for `GEOMETRY` is undefined. When writing data, no min/max
+statistics should be saved for this type and if such non-compliant statistics
+are found during reading, they must be ignored. Instead,
[GeometryStatistics](#geometry-statistics)
+is introduced for `GEOMETRY` type.
+
+#### Geometry Encoding
+
+Physical type and encoding for the `GEOMETRY` type. Supported values:
+- `WKB`: `GEOMETRY` type with `WKB` encoding can only be used to annotate the
+ `BYTE_ARRAY` primitive type. See [WKB](#well-known-binary-wkb).
+
+Note that geometry encoding is required for `GEOMETRY` type. In order to
correctly
+interpret geometry data, writer implementations SHOULD always set this field,
and
+reader implementations SHOULD fail for an unknown geometry encoding value.
+
+##### Well-known binary (WKB)
+
+Well-known binary (WKB) representations of geometries, see [Geospatial
Notes](#geospatial-notes).
+
+To be clear, we follow the same definitions of GeoParquet for
[WKB][geoparquet-wkb]
+and [coordinate axis order][coordinate-axis-order]:
+- Geometries SHOULD be encoded as ISO WKB supporting XY, XYZ, XYM, XYZM.
Supported
+standard geometry types: Point, LineString, Polygon, MultiPoint,
MultiLineString,
+MultiPolygon, and GeometryCollection.
+- Coordinate axis order is always (x, y) where x is easting or longitude, and
+y is northing or latitude. This ordering explicitly overrides the axis order
+as specified in the CRS following the [GeoPackage
specification][geopackage-spec].
+
+This is the preferred encoding for maximum portability.
+
+[geoparquet-wkb]:
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L92
+[coordinate-axis-order]:
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L155
+[geopackage-spec]: https://www.geopackage.org/spec130/#gpb_spec
+
+#### Edges
+
+Interpretation for edges of elements of `GEOMETRY` type. In other words, it
+specifies how a point between two vertices should be interpolated in its XY
+dimensions. Supported values and corresponding interpolation approaches are:
+- `PLANAR`: a Cartesian line connecting the two vertices.
+- `SPHERICAL`: a shortest spherical arc between the longitude and latitude
+ represented by the two vertices.
+
+This value applies to all non-point geometry objects and is independent of the
+[Coordinate Reference System](#coordinate-reference-system).
+
+Because most systems currently assume planar edges and do not support spherical
+edges, `PLANAR` should be used as the default value.
+
+Note that edges is required for `GEOMETRY` type. In order to correctly
+interpret geometry data, writer implementations SHOULD always set this field,
+and reader implementations SHOULD fail for an unknown edges value.
+
+#### Coordinate Reference System
+
+CRS (coordinate reference system) is a mapping of how coordinates refer to
+precise locations on earth. A CRS is specified by a key-value entry in the
+`key_value_metadata` field of `FileMetaData` whose key is a short name of
+the CRS and value is the CRS representation. An additional entry in the
+`key_value_metadata` field with the suffix ".type" is required to describe
+the encoding of this CRS representation.
+
+For example, if a geometry column (e.g., "geom1") uses the CRS "OGC:CRS84", the
+writer may write two entries to `key_value_metadata` field of `FileMetaData` as
+below, and set the `crs` field of the `GEOMETRY` type to "geom1_crs":
+```
+ "geom1_crs": an UTF-8 encoded PROJJSON representation of OGC:CRS84
+ "geom1_crs.type": "PROJJSON"
+```
+
+The PROJJSON representation of OGC:CRS84 can be seen at [OGC:CRS84][ogc-crs84].
+Multiple geometry columns can refer to the same CRS metadata field
+(e.g., "geom1_crs") if they share the same CRS.
+
+[ogc-crs84]:
https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#ogccrs84-details
+
+#### Geometry Statistics
+
+`GeometryStatistics` is a struct to store geometry statistics of a column chunk
+of `GEOMETRY` type. It is an optional field of `ColumnMetaData` and contains
+[Bounding Box](#bounding-box) and [Geometry Types](#geometry-types).
+
+##### Bounding Box
+
+A geometry has at least two coordinate dimensions: X and Y for 2D coordinates
+of each point. A geometry can optionally have Z and / or M values associated
+with each point in the geometry.
+
+The Z values introduce the third dimension coordinate. Usually they are used
+to indicate the height, or elevation.
+
+M values are an opportunity for a geometry to express a fourth dimension as
+a coordinate value. These values can be used as a linear reference value
+(e.g., highway milepost value), a timestamp, or some other value as defined
+by the CRS.
+
+Bounding box is defined as the thrift struct below in the representation of
+min/max value pair of coordinates from each axis. Note that X and Y Values
+are always present. Z and M are omitted for 2D geometries.
+
+```thrift
+struct BoundingBox {
+ /** Min X value when edges = PLANAR, westmost value if edges = SPHERICAL */
+ 1: required double xmin;
+ /** Max X value when edges = PLANAR, eastmost value if edges = SPHERICAL */
+ 2: required double xmax;
+ /** Min Y value when edges = PLANAR, southmost value if edges = SPHERICAL */
+ 3: required double ymin;
+ /** Max Y value when edges = PLANAR, northmost value if edges = SPHERICAL */
+ 4: required double ymax;
+ /** Min Z value if the axis exists */
+ 5: optional double zmin;
+ /** Max Z value if the axis exists */
+ 6: optional double zmax;
+ /** Min M value if the axis exists */
+ 7: optional double mmin;
+ /** Max M value if the axis exists */
+ 8: optional double mmax;
+}
+```
+
+The meaning of each value depends on the `Edges` attribute of the `GEOMETRY`
type:
+- If Edges is `PLANAR`, the values are literally the actual min/max value from
each axis.
+- If Edges is `SPHERICAL`, the values for X and Y are `[westmost, eastmost,
southmost, northmost]`,
+ with necessary min/max values for Z and M if needed.
+
+##### Geometry Types
+
+A list of geometry types from all geometries in the `GEOMETRY` column, or an
+empty list if they are not known.
+
+This is borrowed from [geometry_types of GeoParquet][geometry-types]
+except that values in the list are [WKB (ISO-variant) integer
codes][wkb-integer-code].
+Table below shows the most common geometry types and their codes:
+
+| Type | XY | XYZ | XYM | XYZM |
+| :----------------- | :--- | :--- | :--- | :--: |
+| Point | 0001 | 1001 | 2001 | 3001 |
+| LineString | 0002 | 1002 | 2002 | 3002 |
+| Polygon | 0003 | 1003 | 2003 | 3003 |
+| MultiPoint | 0004 | 1004 | 2004 | 3004 |
+| MultiLineString | 0005 | 1005 | 2005 | 3005 |
+| MultiPolygon | 0006 | 1006 | 2006 | 3006 |
+| GeometryCollection | 0007 | 1007 | 2007 | 3007 |
+
+In addition, the following rules are applied:
+- A list of multiple values indicates that multiple geometry types are present
(e.g. `[0003, 0006]`).
+- An empty array explicitly signals that the geometry types are not known.
+- The geometry types in the list must be unique (e.g. `[0001, 0001]` is not
valid).
Review Comment:
IIUC, it should be empty if we have any invalid geometry data in the column.
`ST_IsValid` deals with this case: https://postgis.net/docs/ST_IsValid.html.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]