mentin commented on code in PR #240:
URL: https://github.com/apache/parquet-format/pull/240#discussion_r1832027057


##########
LogicalTypes.md:
##########
@@ -767,6 +767,188 @@ optional group my_map (MAP_KEY_VALUE) {
 }
 ```
 
+## Geospatial Types
+
+### GEOMETRY
+
+`GEOMETRY` is used for geometry features from [OGC – Simple feature 
access][simple-feature-access].
+See [Geospatial Notes](#geospatial-notes).
+
+The type has three type parameters:
+- `encoding`: A required enum value for annonated physical type and encoding
+              for the `GEOMETRY` type. See [Geometry 
Encoding](#geometry-encoding).
+- `edges`: A required enum value for interpretation for edges of elements of 
the
+           `GEOMETRY` type, i.e. whether the interpolation between points along
+           an edge represents a straight cartesian line or the shortest line on
+           the sphere. See [Edges](#edges).
+- `crs`: An optional string value for CRS (coordinate reference system), which
+         is a mapping of how coordinates refer to precise locations on earth.
+         See [Coordinate Reference System](#coordinate-reference-system).
+
+The sort order used for `GEOMETRY` is undefined. When writing data, no min/max
+statistics should be saved for this type and if such non-compliant statistics
+are found during reading, they must be ignored. Instead, 
[GeometryStatistics](#geometry-statistics)
+is introduced for `GEOMETRY` type.
+
+#### Geometry Encoding
+
+Physical type and encoding for the `GEOMETRY` type. Supported values:
+- `WKB`: `GEOMETRY` type with `WKB` encoding can only be used to annotate the
+         `BYTE_ARRAY` primitive type. See [WKB](#well-known-binary-wkb).
+
+Note that geometry encoding is required for `GEOMETRY` type. In order to 
correctly
+interpret geometry data, writer implementations SHOULD always set this field, 
and
+reader implementations SHOULD fail for an unknown geometry encoding value.
+
+##### Well-known binary (WKB)
+
+Well-known binary (WKB) representations of geometries, see [Geospatial 
Notes](#geospatial-notes).
+
+To be clear, we follow the same definitions of GeoParquet for 
[WKB][geoparquet-wkb]
+and [coordinate axis order][coordinate-axis-order]:
+- Geometries SHOULD be encoded as ISO WKB supporting XY, XYZ, XYM, XYZM. 
Supported
+standard geometry types: Point, LineString, Polygon, MultiPoint, 
MultiLineString,
+MultiPolygon, and GeometryCollection.
+-  Coordinate axis order is always (x, y) where x is easting or longitude, and
+y is northing or latitude. This ordering explicitly overrides the axis order
+as specified in the CRS following the [GeoPackage 
specification][geopackage-spec].
+
+This is the preferred encoding for maximum portability.
+
+[geoparquet-wkb]: 
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L92
+[coordinate-axis-order]: 
https://github.com/opengeospatial/geoparquet/blob/v1.1.0/format-specs/geoparquet.md?plain=1#L155
+[geopackage-spec]: https://www.geopackage.org/spec130/#gpb_spec
+
+#### Edges
+
+Interpretation for edges of elements of `GEOMETRY` type. In other words, it
+specifies how a point between two vertices should be interpolated in its XY
+dimensions. Supported values and corresponding interpolation approaches are:
+- `PLANAR`: a Cartesian line connecting the two vertices.
+- `SPHERICAL`: a shortest spherical arc between the longitude and latitude
+               represented by the two vertices.
+
+This value applies to all non-point geometry objects and is independent of the
+[Coordinate Reference System](#coordinate-reference-system).

Review Comment:
   > The engine capable of interpreting edges as geodesics should do so if the 
CRS reference indicates that the underlying geometry column belongs to an 
ellipsoid datum.
   
   Consider the most common case, SRID 4326. It is Geographic coordinate system 
(GEOGCS) rather than Projected one.
   https://www.esri.com/arcgis-blog/products/arcgis-pro/mapping/gcs_vs_pcs/
   
    So the linestring from A to B should follow the geodesic line. But most 
systems treat 4326 as planar map. E.g. with **Geometry** type in PostGIS or MS 
SQL Server, they treat it as projected coordinate system, and the linestrings 
follow straight lines on flat surface. If you use latest MySQL or **Geography** 
type in PostGIS or MS SQL Server, the linestrings in 4326 follow geodesic lines 
on sphere. So there is ambiguity what exactly a linestring or polygon in 4326 
describes. Is `'point(30 21)` inside `polygon((10 10, 50 10, 50 20, 10 20, 10 
10))`?
   
   With geometry, in PostGIS, returns **false**:
   ```
   select st_intersects(
     st_geomfromtext('polygon((10 10, 50 10, 50 20, 10 20, 10 10))', 4326), 
     st_geomfromtext('point(30 21)', 4326));
   ```
   Same thing with geography (4326 is presumed), returns **true**:
   ```
    select st_intersects(
     st_geographyfromtext('srid=4326;polygon((10 10, 50 10, 50 20, 10 20, 10 
10))'),
     st_geographyfromtext('srid=4326;point(30 21)'));
   ```
   
   Unfortunately, there is no accepted way to describe the difference between 
geometry and geography in WKB format. You can encounter SRID=4326 with both 
interpretations. The `edge` attribute allows describing the difference between 
geometry and geography, and tells user how to interpret the data in a way 
consistent with the system that produced it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to