jiayuasu opened a new pull request, #2661:
URL: https://github.com/apache/sedona/pull/2661

   ## Did you read the Contributor Guide?
   
   - Yes, I have read the [Contributor 
Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor 
Development Guide](https://sedona.apache.org/latest/community/develop/)
   
   ## Is this PR related to a ticket?
   
   - Yes, and the PR name follows the format `[GH-XXX] my subject`. Closes #2376
   
   ## What changes were proposed in this PR?
   
   The GeoParquet reader currently ignores CRS metadata when deserializing 
geometries, always setting SRID=0. This PR fixes that by extracting the SRID 
from the PROJJSON CRS metadata and setting it on each deserialized geometry.
   
   ### Changes
   
   **`GeoParquetMetaData.scala`** - Added `extractSridFromCrs(crs: 
Option[JValue]): Int` that uses proj4sedona's `Proj.toAuthority()` to parse 
PROJJSON and extract the authority/code pair. Per the GeoParquet spec:
   - CRS omitted (None) → SRID 4326 (OGC:CRS84 default)
   - CRS explicitly null → SRID 0 (unknown)
   - PROJJSON with EPSG authority → the EPSG integer code
   - PROJJSON with OGC:CRS84 → SRID 4326
   - Anything else (non-EPSG authority, no id field, parse error) → SRID 0
   
   **`GeoParquetSchemaConverter.scala`** - Added `getSrid(columnName: String): 
Int` to look up the SRID for a geometry column from parsed GeoParquet metadata.
   
   **`GeoParquetRowConverter.scala`** - After `WKBReader.read()`, calls 
`geom.setSRID(srid)` to apply the SRID from file metadata to each deserialized 
geometry.
   
   **`pom.xml`** - Bumped proj4sedona dependency from 0.0.4 to 0.0.5 to use the 
new `Proj.toAuthority()` API.
   
   ### How it works
   
   1. When a GeoParquet file is read, `GeoParquetSchemaConverter` parses the 
file-level `"geo"` metadata and extracts the CRS PROJJSON for each geometry 
column.
   2. For each geometry column, `extractSridFromCrs()` converts the PROJJSON to 
a compact JSON string, passes it to `new Proj(jsonStr).toAuthority()`, and maps 
the result to an integer SRID.
   3. `GeoParquetRowConverter` receives the SRID and sets it on every geometry 
deserialized from that column.
   
   ## How was this patch tested?
   
   12 new tests added to `geoparquetIOTests.scala` (all 34 tests in the suite 
pass):
   
   **5 integration tests** (write GeoParquet with specific CRS, read back, 
verify SRID):
   - EPSG PROJJSON (NAD83(2011), EPSG:6318) → SRID 6318
   - Omitted CRS → SRID 4326
   - Null CRS → SRID 0
   - CRS without EPSG id → SRID 0
   - Per-column multi-CRS (g0=EPSG:4326, g1=EPSG:32632)
   
   **7 unit tests** for `extractSridFromCrs`:
   - None → 4326
   - JNull → 0
   - EPSG:4326 PROJJSON → 4326
   - EPSG:32632 PROJJSON → 32632
   - OGC:CRS84 → 4326
   - No id field → 0
   - Non-EPSG authority (IAU:49900) → 0
   
   ## Did this PR include necessary documentation updates?
   
   - No, this PR does not affect any public API so no need to change the 
documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to