mentin commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1832027057
##########
LogicalTypes.md:
##########
@@ -767,6 +767,188 @@ optional group my_map (MAP_KEY_VALUE) {
}
```
+## Geospatial Types
+
+### GEOMETRY
+
+`GEOMETRY` is used for geometry features from [OGC – Simple feature
access][simple-feature-access].
+See [Geospatial Notes](#geospatial-notes).
+
+The type has three type parameters:
+- `encoding`: A required enum value for annonated physical type and encoding
+ for the `GEOMETRY` type. See [Geometry
Encoding](#geometry-encoding).
+- `edges`: A required enum value for interpretation for edges of elements of
the
+ `GEOMETRY` type, i.e. whether the interpolation between points along
+ an edge represents a straight cartesian line or the shortest line on
+ the sphere. See [Edges](#edges).
+- `crs`: An optional string value for CRS (coordinate reference system), which
+ is a mapping of how coordinates refer to precise locations on earth.
+ See [Coordinate Reference System](#coordinate-reference-system).
+
+The sort order used for `GEOMETRY` is undefined. When writing data, no min/max
+statistics should be saved for this type and if such non-compliant statistics
+are found during reading, they must be ignored. Instead,
[GeometryStatistics](#geometry-statistics)
+is introduced for `GEOMETRY` type.
+
+#### Geometry Encoding
+
+Physical type and encoding for the `GEOMETRY` type. Supported values:
+- `WKB`: `GEOMETRY` type with `WKB` encoding can only be used to annotate the
+ `BYTE_ARRAY` primitive type. See [WKB](#well-known-binary-wkb).
+
+Note that geometry encoding is required for `GEOMETRY` type. In order to
correctly
+interpret geometry data, writer implementations SHOULD always set this field,
and
+reader implementations SHOULD fail for an unknown geometry encoding value.
+
+##### Well-known binary (WKB)
+
+Well-known binary (WKB) representations of geometries, see [Geospatial
Notes](#geospatial-notes).
+
+To be clear, we follow the same definitions of GeoParquet for
[WKB][geoparquet-wkb]
+and [coordinate axis order][coordinate-axis-order]:
+- Geometries SHOULD be encoded as ISO WKB supporting XY, XYZ, XYM, XYZM.
Supported
+standard geometry types: Point, LineString, Polygon, MultiPoint,
MultiLineString,
+MultiPolygon, and GeometryCollection.
+- Coordinate axis order is always (x, y) where x is easting or longitude, and
+y is northing or latitude. This ordering explicitly overrides the axis order
+as specified in the CRS following the [GeoPackage
specification][geopackage-spec].
+
+This is the preferred encoding for maximum portability.
+
+[geoparquet-wkb]:
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L92
+[coordinate-axis-order]:
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L155
+[geopackage-spec]: https://www.geopackage.org/spec130/#gpb_spec
+
+#### Edges
+
+Interpretation for edges of elements of `GEOMETRY` type. In other words, it
+specifies how a point between two vertices should be interpolated in its XY
+dimensions. Supported values and corresponding interpolation approaches are:
+- `PLANAR`: a Cartesian line connecting the two vertices.
+- `SPHERICAL`: a shortest spherical arc between the longitude and latitude
+ represented by the two vertices.
+
+This value applies to all non-point geometry objects and is independent of the
+[Coordinate Reference System](#coordinate-reference-system).
Review Comment:
> The engine capable of interpreting edges as geodesics should do so if the
CRS reference indicates that the underlying geometry column belongs to an
ellipsoid datum.
Consider the most common case, SRID 4326. It is Geographic coordinate system
(GEOGCS) rather than Projected one.
https://www.esri.com/arcgis-blog/products/arcgis-pro/mapping/gcs_vs_pcs/
So the linestring from A to B should follow the geodesic line. But most
systems treat 4326 as planar map. E.g. with **Geometry** type in PostGIS or MS
SQL Server, they treat it as projected coordinate system, and the linestrings
follow straight lines on flat surface. If you use latest MySQL or **Geography**
type in PostGIS or MS SQL Server, the linestrings in 4326 follow geodesic lines
on sphere. So there is ambiguity what exactly a linestring or polygon in 4326
describes. Is `'point(30 21)` inside `polygon((10 10, 50 10, 50 20, 10 20, 10
10))`?
With geometry, in PostGIS:
```
select st_intersects(
st_geomfromtext('polygon((10 10, 50 10, 50 20, 10 20, 10 10))', 4326),
st_geomfromtext('point(30 21)', 4326));
```
Same thing with geography (4326 is presumed)
```
select st_intersects(
st_geographyfromtext('srid=4326;polygon((10 10, 50 10, 50 20, 10 20, 10
10))'),
st_geographyfromtext('srid=4326;point(30 21)'));
```
Unfortunately, there is no accepted way to describe the difference between
geometry and geography in WKB format. You can encounter SRID=4326 with both
interpretations. The `edge` attribute allows describing the difference between
geometry and geography, and tells user how to interpret the data in a way
consistent with the system that produced it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]