jiayuasu opened a new pull request, #2667: URL: https://github.com/apache/sedona/pull/2667
## Did you read the Contributor Guide? - Yes, I have read the [Contributor Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor Developer Guide](https://sedona.apache.org/latest/community/develop/) ## Is this PR related to a ticket? - Yes, and the PR name follows the format `[GH-XXX] my subject`. Closes #2664 ## What changes were proposed in this PR? When writing GeoParquet files, the writer now automatically derives PROJJSON CRS metadata from the geometry SRID, instead of always writing `null` (unknown CRS) when no explicit `geoparquet.crs` option is provided. ### Behavior - **SRID 4326**: CRS field is **omitted** from GeoParquet metadata, since the GeoParquet spec defines the default CRS as OGC:CRS84 (equivalent to EPSG:4326). This is a no-op that keeps file metadata minimal. - **SRID > 0 (non-4326)**: Generates PROJJSON via `CRSSerializer.toProjJson()` from proj4sedona and writes it to the `crs` field. The PROJJSON includes the `id` field (e.g., `{"authority":"EPSG","code":32632}`) for interoperability with other tools. - **SRID 0 or mixed SRIDs**: Falls back to `null` (unknown CRS), consistent with GeoPandas behavior. - **Explicit `geoparquet.crs` option**: Always takes precedence over SRID-derived CRS. ### Changes **pom.xml** - Bump proj4sedona version from 0.0.5 to 0.0.6 (adds `id` field support in `CRSSerializer.toProjJson()`). **GeoParquetMetaData.scala** - Added `sridToProjJson(srid: Int): Option[JValue]` utility method. Returns `None` for SRID 0 and 4326 (default CRS), generates PROJJSON for other SRIDs using proj4sedona `CRSSerializer.toProjJson()`. **GeoParquetWriteSupport.scala** - Track observed SRID per geometry column during writing (`_srid`, `_mixedSrids`, `observedSrid`). - Added `userExplicitlySetDefaultCrs` flag to distinguish "no option provided" from "user explicitly set CRS". - In `finalizeWrite()`: when no explicit CRS option is provided, derive CRS from the observed SRID. For SRID 4326, omit CRS entirely. For other SRIDs, generate PROJJSON. For SRID 0 or mixed SRIDs, write `null`. **geoparquetIOTests.scala** - "GeoParquet save should omit CRS for SRID 4326 per GeoParquet default": verifies CRS is omitted and round-trip preserves SRID 4326. - "GeoParquet save should auto-generate projjson from non-default SRID": verifies PROJJSON with EPSG:32632 identifier and round-trip. - "GeoParquet save should keep crs null when geometry SRID is 0": verifies `null` CRS for unknown SRID. - "GeoParquet save should use explicit CRS option over SRID-derived CRS": verifies explicit option takes precedence. - "GeoParquet save should keep crs null for mixed SRIDs in one column": verifies `null` CRS for mixed SRIDs. **geoparquet-sedona-spark.md** - Document the automatic CRS from SRID behavior. ## How was this patch tested? All 46 geoparquetIOTests pass: ``` mvn test -pl spark/common -Dlog4j.version=2.19.0 -Dsuites=org.apache.sedona.sql.geoparquetIOTests -DfailIfNoTests=false ``` ## Did this PR include necessary documentation updates? - Yes, I have updated the documentation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
