szehon-ho commented on code in PR #55207:
URL: https://github.com/apache/spark/pull/55207#discussion_r3047796739


##########
docs/sql-ref-geospatial-types.md:
##########
@@ -142,6 +142,92 @@ SELECT 
ST_Srid(ST_SetSrid(ST_GeomFromWKB(X'0101000000000000000000F03F00000000000
 * **Fixed-SRID columns**: Every value in the column must have the same SRID as 
the column type. Inserting a value with a different SRID can raise an error (or 
you can use `ST_SetSrid` to set the value’s SRID to match the column).
 * **Mixed-SRID columns** (`GEOMETRY(ANY)` or `GEOGRAPHY(ANY)`): Values can 
have different SRIDs. Only valid SRIDs are allowed.
 * **Storage**: Parquet, Delta, and Iceberg store geometry/geography with a 
fixed SRID per column; mixed-SRID types are for in-memory/query use. When 
writing to these formats, a concrete (fixed) SRID is required.
+### Supported SRIDs
+
+Spark includes a pre-built registry of standard Spatial Reference Identifiers 
(SRIDs) from the PROJ database, with overrides to support OGC standards. This 
registry enables validation and proper handling of coordinate systems for 
geospatial data.
+
+#### Commonly Used SRIDs
+
+| SRID | Name | Description | Typical Use Case |
+|------|------|-------------|------------------|
+| 4326 | WGS 84 | World Geodetic System 1984 (latitude/longitude) | GPS 
coordinates, global data (default for GEOGRAPHY) |
+| 3857 | Web Mercator | Pseudo-Mercator projection used by web mapping 
services | Web maps (Google Maps, OpenStreetMap, Bing Maps) |
+| 2154 | RGF93 / Lambert-93 | French national coordinate system | 
France-specific mapping and GIS |
+| 32633 | WGS 84 / UTM zone 33N | Universal Transverse Mercator, zone 33 North 
| Central Europe (6°E to 12°E) |
+| 32634 | WGS 84 / UTM zone 34N | Universal Transverse Mercator, zone 34 North 
| Eastern Europe (12°E to 18°E) |
+| 32635 | WGS 84 / UTM zone 35N | Universal Transverse Mercator, zone 35 North 
| Eastern Europe/Western Asia (18°E to 24°E) |

Review Comment:
   Consider adding a **CRS Identifier** column. Spark maps SRIDs to CRS strings 
internally, and these strings are visible to users in `df.schema.json()` output 
and in Parquet/Delta/Iceberg storage metadata. For example, `GEOMETRY(4326)` 
stores as `geometry(OGC:CRS84)` in JSON schema — not `EPSG:4326`. This is a 
common source of confusion.
   
   The key mappings are:
   | SRID | CRS Identifier |
   |------|---------------|
   | 0 | `SRID:0` |
   | 3857 | `EPSG:3857` |
   | 4326 | `OGC:CRS84` |
   | 4267 | `OGC:CRS27` |
   | 4269 | `OGC:CRS83` |
   
   Also worth noting which SRIDs are valid for GEOGRAPHY vs GEOMETRY. For 
instance, `GEOMETRY(3857)` works but `GEOGRAPHY(3857)` will error because 3857 
is a projected (non-geographic) CRS. That's a real pitfall for users.



##########
docs/sql-ref-geospatial-types.md:
##########
@@ -142,6 +142,92 @@ SELECT 
ST_Srid(ST_SetSrid(ST_GeomFromWKB(X'0101000000000000000000F03F00000000000
 * **Fixed-SRID columns**: Every value in the column must have the same SRID as 
the column type. Inserting a value with a different SRID can raise an error (or 
you can use `ST_SetSrid` to set the value’s SRID to match the column).
 * **Mixed-SRID columns** (`GEOMETRY(ANY)` or `GEOGRAPHY(ANY)`): Values can 
have different SRIDs. Only valid SRIDs are allowed.
 * **Storage**: Parquet, Delta, and Iceberg store geometry/geography with a 
fixed SRID per column; mixed-SRID types are for in-memory/query use. When 
writing to these formats, a concrete (fixed) SRID is required.
+### Supported SRIDs
+
+Spark includes a pre-built registry of standard Spatial Reference Identifiers 
(SRIDs) from the PROJ database, with overrides to support OGC standards. This 
registry enables validation and proper handling of coordinate systems for 
geospatial data.
+
+#### Commonly Used SRIDs
+
+| SRID | Name | Description | Typical Use Case |
+|------|------|-------------|------------------|
+| 4326 | WGS 84 | World Geodetic System 1984 (latitude/longitude) | GPS 
coordinates, global data (default for GEOGRAPHY) |
+| 3857 | Web Mercator | Pseudo-Mercator projection used by web mapping 
services | Web maps (Google Maps, OpenStreetMap, Bing Maps) |
+| 2154 | RGF93 / Lambert-93 | French national coordinate system | 
France-specific mapping and GIS |
+| 32633 | WGS 84 / UTM zone 33N | Universal Transverse Mercator, zone 33 North 
| Central Europe (6°E to 12°E) |
+| 32634 | WGS 84 / UTM zone 34N | Universal Transverse Mercator, zone 34 North 
| Eastern Europe (12°E to 18°E) |
+| 32635 | WGS 84 / UTM zone 35N | Universal Transverse Mercator, zone 35 North 
| Eastern Europe/Western Asia (18°E to 24°E) |
+
+The registry includes many additional SRIDs for various UTM zones, national 
coordinate systems, and other projections. For a complete list, refer to the 
[EPSG Geodetic Parameter Dataset](https://epsg.org/).
+
+#### Using Different SRIDs
+
+**Creating tables with specific SRIDs:**

Review Comment:
   Most of the examples in sections "Using Different SRIDs", "Converting 
between SRIDs", and "SRID Validation" repeat what the page already covers in 
"Creating Tables" (lines 62–79) and "Built-in Geospatial Functions" (lines 
129–137). Consider replacing them with examples that show genuinely new 
behavior:
   
   - **SRID validation error**: The 99999 case is useful — keep it.
   - **GEOGRAPHY vs GEOMETRY pitfall**: Show that `GEOGRAPHY(3857)` errors 
because 3857 is non-geographic — this is a real user trap not documented 
elsewhere.
   - **OGC CRS strings in metadata**: Show that `df.schema.json()` for 
`GEOMETRY(4326)` contains `OGC:CRS84`, so users know what to expect in 
Parquet/storage metadata.



##########
docs/sql-ref-geospatial-types.md:
##########
@@ -142,6 +142,92 @@ SELECT 
ST_Srid(ST_SetSrid(ST_GeomFromWKB(X'0101000000000000000000F03F00000000000
 * **Fixed-SRID columns**: Every value in the column must have the same SRID as 
the column type. Inserting a value with a different SRID can raise an error (or 
you can use `ST_SetSrid` to set the value’s SRID to match the column).
 * **Mixed-SRID columns** (`GEOMETRY(ANY)` or `GEOGRAPHY(ANY)`): Values can 
have different SRIDs. Only valid SRIDs are allowed.
 * **Storage**: Parquet, Delta, and Iceberg store geometry/geography with a 
fixed SRID per column; mixed-SRID types are for in-memory/query use. When 
writing to these formats, a concrete (fixed) SRID is required.
+### Supported SRIDs
+
+Spark includes a pre-built registry of standard Spatial Reference Identifiers 
(SRIDs) from the PROJ database, with overrides to support OGC standards. This 
registry enables validation and proper handling of coordinate systems for 
geospatial data.
+
+#### Commonly Used SRIDs
+
+| SRID | Name | Description | Typical Use Case |
+|------|------|-------------|------------------|
+| 4326 | WGS 84 | World Geodetic System 1984 (latitude/longitude) | GPS 
coordinates, global data (default for GEOGRAPHY) |
+| 3857 | Web Mercator | Pseudo-Mercator projection used by web mapping 
services | Web maps (Google Maps, OpenStreetMap, Bing Maps) |
+| 2154 | RGF93 / Lambert-93 | French national coordinate system | 
France-specific mapping and GIS |
+| 32633 | WGS 84 / UTM zone 33N | Universal Transverse Mercator, zone 33 North 
| Central Europe (6°E to 12°E) |
+| 32634 | WGS 84 / UTM zone 34N | Universal Transverse Mercator, zone 34 North 
| Eastern Europe (12°E to 18°E) |
+| 32635 | WGS 84 / UTM zone 35N | Universal Transverse Mercator, zone 35 North 
| Eastern Europe/Western Asia (18°E to 24°E) |
+
+The registry includes many additional SRIDs for various UTM zones, national 
coordinate systems, and other projections. For a complete list, refer to the 
[EPSG Geodetic Parameter Dataset](https://epsg.org/).
+
+#### Using Different SRIDs
+
+**Creating tables with specific SRIDs:**
+
+```sql
+-- Web Mercator projection (common for web mapping applications)
+CREATE TABLE web_map_data (
+  id BIGINT,
+  location GEOMETRY(3857)
+);
+
+-- UTM zone 33N for Central Europe
+CREATE TABLE europe_survey_data (
+  id BIGINT,
+  measurement_point GEOMETRY(32633)
+);
+
+-- French national grid
+CREATE TABLE france_cadastre (
+  id BIGINT,
+  parcel GEOMETRY(2154)
+);
+```
+
+**Converting between SRIDs:**

Review Comment:
   The heading "Converting between SRIDs" implies coordinate reprojection, but 
`ST_SetSrid` only changes metadata. Suggest renaming to something like 
**"Setting or Changing SRID Metadata"**.
   
   Also, the example changes a point from SRID 4326 (lat/lon in degrees) to 
3857 (Web Mercator in meters) — this produces a semantically incorrect result 
since the coordinates are still degree values but now labeled as meters. A 
better example would set SRID on data that was created without one, e.g. SRID 0 
→ 4326, which is the common real-world use case. The existing doc already shows 
an `ST_SetSrid` example (line 136) that does this correctly.



##########
docs/sql-ref-geospatial-types.md:
##########
@@ -142,6 +142,92 @@ SELECT 
ST_Srid(ST_SetSrid(ST_GeomFromWKB(X'0101000000000000000000F03F00000000000
 * **Fixed-SRID columns**: Every value in the column must have the same SRID as 
the column type. Inserting a value with a different SRID can raise an error (or 
you can use `ST_SetSrid` to set the value’s SRID to match the column).
 * **Mixed-SRID columns** (`GEOMETRY(ANY)` or `GEOGRAPHY(ANY)`): Values can 
have different SRIDs. Only valid SRIDs are allowed.
 * **Storage**: Parquet, Delta, and Iceberg store geometry/geography with a 
fixed SRID per column; mixed-SRID types are for in-memory/query use. When 
writing to these formats, a concrete (fixed) SRID is required.
+### Supported SRIDs
+
+Spark includes a pre-built registry of standard Spatial Reference Identifiers 
(SRIDs) from the PROJ database, with overrides to support OGC standards. This 
registry enables validation and proper handling of coordinate systems for 
geospatial data.
+
+#### Commonly Used SRIDs
+
+| SRID | Name | Description | Typical Use Case |
+|------|------|-------------|------------------|
+| 4326 | WGS 84 | World Geodetic System 1984 (latitude/longitude) | GPS 
coordinates, global data (default for GEOGRAPHY) |
+| 3857 | Web Mercator | Pseudo-Mercator projection used by web mapping 
services | Web maps (Google Maps, OpenStreetMap, Bing Maps) |
+| 2154 | RGF93 / Lambert-93 | French national coordinate system | 
France-specific mapping and GIS |
+| 32633 | WGS 84 / UTM zone 33N | Universal Transverse Mercator, zone 33 North 
| Central Europe (6°E to 12°E) |
+| 32634 | WGS 84 / UTM zone 34N | Universal Transverse Mercator, zone 34 North 
| Eastern Europe (12°E to 18°E) |
+| 32635 | WGS 84 / UTM zone 35N | Universal Transverse Mercator, zone 35 North 
| Eastern Europe/Western Asia (18°E to 24°E) |
+
+The registry includes many additional SRIDs for various UTM zones, national 
coordinate systems, and other projections. For a complete list, refer to the 
[EPSG Geodetic Parameter Dataset](https://epsg.org/).
+
+#### Using Different SRIDs
+
+**Creating tables with specific SRIDs:**
+
+```sql
+-- Web Mercator projection (common for web mapping applications)
+CREATE TABLE web_map_data (
+  id BIGINT,
+  location GEOMETRY(3857)
+);
+
+-- UTM zone 33N for Central Europe
+CREATE TABLE europe_survey_data (
+  id BIGINT,
+  measurement_point GEOMETRY(32633)
+);
+
+-- French national grid
+CREATE TABLE france_cadastre (
+  id BIGINT,
+  parcel GEOMETRY(2154)
+);
+```
+
+**Converting between SRIDs:**
+
+```sql
+-- Create a point in WGS 84 (SRID 4326)
+SELECT ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040', 4326) AS 
point_wgs84;
+
+-- Change SRID to Web Mercator (note: this only changes the SRID metadata, not 
the coordinates)
+SELECT ST_SetSrid(
+  ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040', 4326),
+  3857
+) AS point_web_mercator;
+```
+
+**Important:** `ST_SetSrid` only changes the SRID metadata; it does not 
transform coordinates. For actual coordinate transformation between different 
coordinate systems, use appropriate transformation functions or external tools.
+
+#### SRID Validation
+
+When creating GEOMETRY or GEOGRAPHY values, Spark validates that the specified 
SRID exists in the pre-built registry. Using an unsupported or invalid SRID 
will result in an error.
+
+```sql
+-- Valid: 4326 is in the registry
+SELECT ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040', 4326);

Review Comment:
   The 4326 and 3857 examples here repeat what's already shown in the "Built-in 
Geospatial Functions" section above. Consider trimming to just the 99999 error 
case — that's the genuinely new and useful example. You could also add a 
`GEOGRAPHY(3857)` failure example here, since that's a real pitfall not 
documented elsewhere.



##########
docs/sql-ref-geospatial-types.md:
##########
@@ -142,6 +142,92 @@ SELECT 
ST_Srid(ST_SetSrid(ST_GeomFromWKB(X'0101000000000000000000F03F00000000000
 * **Fixed-SRID columns**: Every value in the column must have the same SRID as 
the column type. Inserting a value with a different SRID can raise an error (or 
you can use `ST_SetSrid` to set the value’s SRID to match the column).
 * **Mixed-SRID columns** (`GEOMETRY(ANY)` or `GEOGRAPHY(ANY)`): Values can 
have different SRIDs. Only valid SRIDs are allowed.
 * **Storage**: Parquet, Delta, and Iceberg store geometry/geography with a 
fixed SRID per column; mixed-SRID types are for in-memory/query use. When 
writing to these formats, a concrete (fixed) SRID is required.
+### Supported SRIDs
+
+Spark includes a pre-built registry of standard Spatial Reference Identifiers 
(SRIDs) from the PROJ database, with overrides to support OGC standards. This 
registry enables validation and proper handling of coordinate systems for 
geospatial data.
+
+#### Commonly Used SRIDs
+
+| SRID | Name | Description | Typical Use Case |
+|------|------|-------------|------------------|
+| 4326 | WGS 84 | World Geodetic System 1984 (latitude/longitude) | GPS 
coordinates, global data (default for GEOGRAPHY) |
+| 3857 | Web Mercator | Pseudo-Mercator projection used by web mapping 
services | Web maps (Google Maps, OpenStreetMap, Bing Maps) |
+| 2154 | RGF93 / Lambert-93 | French national coordinate system | 
France-specific mapping and GIS |
+| 32633 | WGS 84 / UTM zone 33N | Universal Transverse Mercator, zone 33 North 
| Central Europe (6°E to 12°E) |
+| 32634 | WGS 84 / UTM zone 34N | Universal Transverse Mercator, zone 34 North 
| Eastern Europe (12°E to 18°E) |
+| 32635 | WGS 84 / UTM zone 35N | Universal Transverse Mercator, zone 35 North 
| Eastern Europe/Western Asia (18°E to 24°E) |
+
+The registry includes many additional SRIDs for various UTM zones, national 
coordinate systems, and other projections. For a complete list, refer to the 
[EPSG Geodetic Parameter Dataset](https://epsg.org/).
+
+#### Using Different SRIDs
+
+**Creating tables with specific SRIDs:**
+
+```sql
+-- Web Mercator projection (common for web mapping applications)
+CREATE TABLE web_map_data (
+  id BIGINT,
+  location GEOMETRY(3857)
+);
+
+-- UTM zone 33N for Central Europe
+CREATE TABLE europe_survey_data (
+  id BIGINT,
+  measurement_point GEOMETRY(32633)
+);
+
+-- French national grid
+CREATE TABLE france_cadastre (
+  id BIGINT,
+  parcel GEOMETRY(2154)
+);
+```
+
+**Converting between SRIDs:**
+
+```sql
+-- Create a point in WGS 84 (SRID 4326)
+SELECT ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040', 4326) AS 
point_wgs84;
+
+-- Change SRID to Web Mercator (note: this only changes the SRID metadata, not 
the coordinates)
+SELECT ST_SetSrid(
+  ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040', 4326),
+  3857
+) AS point_web_mercator;
+```
+
+**Important:** `ST_SetSrid` only changes the SRID metadata; it does not 
transform coordinates. For actual coordinate transformation between different 
coordinate systems, use appropriate transformation functions or external tools.
+
+#### SRID Validation
+
+When creating GEOMETRY or GEOGRAPHY values, Spark validates that the specified 
SRID exists in the pre-built registry. Using an unsupported or invalid SRID 
will result in an error.
+
+```sql
+-- Valid: 4326 is in the registry
+SELECT ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040', 4326);
+-- Returns: GEOMETRY with SRID 4326
+
+-- Valid: 3857 (Web Mercator) is in the registry
+SELECT ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040', 3857);
+-- Returns: GEOMETRY with SRID 3857
+
+-- Error: 99999 is not a valid SRID in the registry
+SELECT ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040', 99999);
+-- Throws error: Invalid SRID
+```
+
+#### SRID 0 (Unspecified)
+
+SRID 0 represents an unspecified or unknown coordinate system. It is allowed 
for GEOMETRY types but should be used with caution:

Review Comment:
   A few issues here:
   
   1. **"should be used with caution"** is overstated — SRID 0 is the default 
for `ST_GeomFromWKB(wkb)` and is actively used in `CREATE TABLE` (e.g., `CREATE 
TABLE t (geom GEOMETRY(0)) USING PARQUET` in the test suite). It's a standard 
convention (PostGIS uses the same).
   
   2. **Missing GEOGRAPHY restriction** — SRID 0 is **not** valid for GEOGRAPHY 
types (it's registered as non-geographic, so 
`GeographicSpatialReferenceSystemMapper` rejects it). This is important to 
document.
   
   3. **Could be confused with `GEOMETRY(ANY)`** — Worth clarifying that 
`GEOMETRY(0)` means a fixed SRID of 0 (Cartesian, no defined CRS), not "per-row 
SRID." Per-row SRIDs use `GEOMETRY(ANY)`.



##########
docs/sql-ref-geospatial-types.md:
##########
@@ -142,6 +142,92 @@ SELECT 
ST_Srid(ST_SetSrid(ST_GeomFromWKB(X'0101000000000000000000F03F00000000000
 * **Fixed-SRID columns**: Every value in the column must have the same SRID as 
the column type. Inserting a value with a different SRID can raise an error (or 
you can use `ST_SetSrid` to set the value’s SRID to match the column).
 * **Mixed-SRID columns** (`GEOMETRY(ANY)` or `GEOGRAPHY(ANY)`): Values can 
have different SRIDs. Only valid SRIDs are allowed.
 * **Storage**: Parquet, Delta, and Iceberg store geometry/geography with a 
fixed SRID per column; mixed-SRID types are for in-memory/query use. When 
writing to these formats, a concrete (fixed) SRID is required.
+### Supported SRIDs
+
+Spark includes a pre-built registry of standard Spatial Reference Identifiers 
(SRIDs) from the PROJ database, with overrides to support OGC standards. This 
registry enables validation and proper handling of coordinate systems for 
geospatial data.
+
+#### Commonly Used SRIDs
+
+| SRID | Name | Description | Typical Use Case |
+|------|------|-------------|------------------|
+| 4326 | WGS 84 | World Geodetic System 1984 (latitude/longitude) | GPS 
coordinates, global data (default for GEOGRAPHY) |
+| 3857 | Web Mercator | Pseudo-Mercator projection used by web mapping 
services | Web maps (Google Maps, OpenStreetMap, Bing Maps) |
+| 2154 | RGF93 / Lambert-93 | French national coordinate system | 
France-specific mapping and GIS |
+| 32633 | WGS 84 / UTM zone 33N | Universal Transverse Mercator, zone 33 North 
| Central Europe (6°E to 12°E) |
+| 32634 | WGS 84 / UTM zone 34N | Universal Transverse Mercator, zone 34 North 
| Eastern Europe (12°E to 18°E) |
+| 32635 | WGS 84 / UTM zone 35N | Universal Transverse Mercator, zone 35 North 
| Eastern Europe/Western Asia (18°E to 24°E) |
+
+The registry includes many additional SRIDs for various UTM zones, national 
coordinate systems, and other projections. For a complete list, refer to the 
[EPSG Geodetic Parameter Dataset](https://epsg.org/).

Review Comment:
   The registry also includes **ESRI** entries (e.g., `ESRI:102100`), not just 
EPSG. And it's pinned to **PROJ 9.7.1** — not synced live with EPSG. The link 
to epsg.org could be misleading since users may find SRIDs there that aren't in 
Spark's registry, or miss ESRI SRIDs that are. Consider referencing the actual 
registry CSV or at least mentioning the PROJ version and ESRI inclusion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to