jiayuasu opened a new pull request, #2665:
URL: https://github.com/apache/sedona/pull/2665

   ## Did you read the Contributor Guide?
   
   - Yes, I have read the [Contributor 
Rules](https://sedona.apache.org/latest/community/rule/) and [Contributor 
Developer Guide](https://sedona.apache.org/latest/community/develop/)
   
   ## Is this PR related to a ticket?
   
   - Yes, and the PR name follows the format `[GH-XXX] my subject`. Closes #2646
   
   ## What changes were proposed in this PR?
   
   Add a `geoparquet.covering.mode` option to control automatic covering 
metadata generation when writing GeoParquet files.
   
   ### Behavior
   
   - **auto (default)**: For GeoParquet 1.1.0 writes, automatically generate or 
reuse `<geometryColumnName>_bbox` covering columns and write corresponding 
covering metadata. If the user has already provided explicit 
`geoparquet.covering` or `geoparquet.covering.<col>` options, those take 
precedence and auto-generation is skipped.
   - **legacy**: No automatic covering generation. Explicit covering options 
still work as before.
   
   ### Changes
   
   **GeoParquetMetaData.scala**
   
   - Added constants: `GEOPARQUET_COVERING_MODE_KEY`, 
`GEOPARQUET_COVERING_MODE_AUTO`, `GEOPARQUET_COVERING_MODE_LEGACY`.
   
   **GeoParquetWriteSupport.scala**
   
   - Parse and validate `geoparquet.covering.mode` from Hadoop configuration. 
Throw `IllegalArgumentException` for invalid values.
   - `maybeAutoGenerateCoveringColumns()`: when auto mode is enabled and no 
explicit covering options are provided, for each geometry column: reuse an 
existing valid `_bbox` struct column, or generate one from the geometry 
envelope.
   - Guard against key collision when a geometry column is named "mode" (skip 
`geoparquet.covering.mode` in per-column covering parsing).
   - Gracefully handle the case where an existing `_bbox` column has invalid 
structure (log warning and skip instead of crashing).
   
   **geoparquetIOTests.scala**
   
   - Test auto-covering reuses existing valid `geometry_bbox` column.
   - Test auto-covering generates `geometry_bbox` when no covering column 
exists.
   - Test legacy mode disables auto-generation.
   - Test invalid mode is rejected with a clear error message.
   - Test auto-covering for multiple geometry columns.
   - Test auto-covering is not applied for non-1.1.0 versions.
   - Fix round-trip comparison tests to select only original columns 
(auto-covering adds `geometry_bbox`).
   
   **geoparquet-sedona-spark.md**
   
   - Document the `geoparquet.covering.mode` option, default behavior, and how 
to opt out.
   - Note that the default GeoParquet version is `1.1.0` since `v1.9.0`.
   
   ## How was this patch tested?
   
   All 40 geoparquetIOTests pass:
   
   ```
   mvn test -pl spark/common -Dlog4j.version=2.19.0 
-DwildcardSuites=org.apache.sedona.sql.geoparquetIOTests
   ```
   
   ## Did this PR include necessary documentation updates?
   
   - Yes, I have updated the documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to