szehon-ho commented on code in PR #55207: URL: https://github.com/apache/spark/pull/55207#discussion_r3124378205
########## docs/sql-ref-geospatial-types.md: ########## @@ -143,6 +143,49 @@ SELECT ST_Srid(ST_SetSrid(ST_GeomFromWKB(X'0101000000000000000000F03F00000000000 * **Mixed-SRID columns** (`GEOMETRY(ANY)` or `GEOGRAPHY(ANY)`): Values can have different SRIDs. Only valid SRIDs are allowed. * **Storage**: Parquet, Delta, and Iceberg store geometry/geography with a fixed SRID per column; mixed-SRID types are for in-memory/query use. When writing to these formats, a concrete (fixed) SRID is required. +### Supported SRIDs + +Spark includes a pre-built SRID registry with OGC standard overrides for commonly used coordinate systems. This registry enables validation and proper handling of coordinate systems for geospatial data. + +**SRID Compatibility Rules:** +- **GEOMETRY** accepts all SRIDs in the registry (geographic + projected + SRID 0) +- **GEOGRAPHY** only accepts geographic SRIDs (latitude/longitude coordinate systems) + +#### Commonly Used SRIDs + +| SRID | CRS Identifier | Name | CRS Type | Description | +|------|----------------|------|----------|-------------| +| 0 | `SRID:0` | Unspecified | Cartesian | Coordinates with no defined CRS (default for `ST_GeomFromWKB(wkb)`) | +| 4326 | `OGC:CRS84` | WGS 84 | Geographic | World Geodetic System 1984 (longitude/latitude), GPS coordinates, global data (default for GEOGRAPHY) | +| 4267 | `OGC:CRS27` | NAD27 | Geographic | North American Datum 1927 | +| 4269 | `OGC:CRS83` | NAD83 | Geographic | North American Datum 1983 | +| 3857 | `EPSG:3857` | Web Mercator | Projected | Pseudo-Mercator projection used by web mapping services | + +**Notes:** +* `GEOMETRY(0)` means a fixed SRID of 0 (all rows use SRID 0), not per-row SRIDs. + For per-row SRIDs, use `GEOMETRY(ANY)`. +* `GEOMETRY(ANY)` and `GEOGRAPHY(ANY)` are valid for in-memory and query use, but cannot be + persisted — the [Parquet](https://github.com/apache/parquet-format/blob/master/Geospatial.md) + and [Iceberg](https://github.com/apache/iceberg/blob/master/format/spec.md) geospatial + specifications require a fixed SRID per column. +* The registry is based on PROJ 9.7.1 and includes both EPSG and ESRI coordinate systems. Review Comment: yea, something like 'since version' table maybe? example from other docs: https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html Yes, separate table, I am thinking: * Proj version table * OGC overrides table * Commonly used SRID's My initial hunch was there's no huge value in 'commonly used srids', as it can be probably be found on the web, but I can go either way. The first two are more important imo, as it exactly describes the algorithm to select supported srids. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
