jiayuasu opened a new pull request, #2665: URL: https://github.com/apache/sedona/pull/2665
## Did you read the Contributor Guide? - Yes, I have read the [Contributor Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor Developer Guide](https://sedona.apache.org/latest/community/develop/) ## Is this PR related to a ticket? - Yes, and the PR name follows the format `[GH-XXX] my subject`. Closes #2646 ## What changes were proposed in this PR? Add a `geoparquet.covering.mode` option to control automatic covering metadata generation when writing GeoParquet files. ### Behavior - **auto (default)**: For GeoParquet 1.1.0 writes, automatically generate or reuse `<geometryColumnName>_bbox` covering columns and write corresponding covering metadata. If the user has already provided explicit `geoparquet.covering` or `geoparquet.covering.<col>` options, those take precedence and auto-generation is skipped. - **legacy**: No automatic covering generation. Explicit covering options still work as before. ### Changes **GeoParquetMetaData.scala** - Added constants: `GEOPARQUET_COVERING_MODE_KEY`, `GEOPARQUET_COVERING_MODE_AUTO`, `GEOPARQUET_COVERING_MODE_LEGACY`. **GeoParquetWriteSupport.scala** - Parse and validate `geoparquet.covering.mode` from Hadoop configuration. Throw `IllegalArgumentException` for invalid values. - `maybeAutoGenerateCoveringColumns()`: when auto mode is enabled and no explicit covering options are provided, for each geometry column: reuse an existing valid `_bbox` struct column, or generate one from the geometry envelope. - Guard against key collision when a geometry column is named "mode" (skip `geoparquet.covering.mode` in per-column covering parsing). - Gracefully handle the case where an existing `_bbox` column has invalid structure (log warning and skip instead of crashing). **geoparquetIOTests.scala** - Test auto-covering reuses existing valid `geometry_bbox` column. - Test auto-covering generates `geometry_bbox` when no covering column exists. - Test legacy mode disables auto-generation. - Test invalid mode is rejected with a clear error message. - Test auto-covering for multiple geometry columns. - Test auto-covering is not applied for non-1.1.0 versions. - Fix round-trip comparison tests to select only original columns (auto-covering adds `geometry_bbox`). **geoparquet-sedona-spark.md** - Document the `geoparquet.covering.mode` option, default behavior, and how to opt out. - Note that the default GeoParquet version is `1.1.0` since `v1.9.0`. ## How was this patch tested? All 40 geoparquetIOTests pass: ``` mvn test -pl spark/common -Dlog4j.version=2.19.0 -DwildcardSuites=org.apache.sedona.sql.geoparquetIOTests ``` ## Did this PR include necessary documentation updates? - Yes, I have updated the documentation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
