Kontinuation commented on issue #1296:
URL: https://github.com/apache/sedona/issues/1296#issuecomment-2029885038

   The `geo` metadata in the parquet footers may not be the same for all 
written geoparquet files, especially the bbox field, this makes the default 
parquet footer metadata merging process fail with the following exception:
   
   ```
   java.lang.RuntimeException: could not merge metadata: key geo has 
conflicting values: 
[{"version":"1.0.0","primary_column":"geom","columns":{"geom":{"encoding":"WKB","geometry_types":["Polygon"],"bbox":[1.0,1.0,9998.0,9998.0],"crs":null}}},
 
{"version":"1.0.0","primary_column":"geom","columns":{"geom":{"encoding":"WKB","geometry_types":["Polygon"],"bbox":[0.0,0.0,10000.0,10000.0],"crs":null}}}]
        at 
org.apache.parquet.hadoop.metadata.StrictKeyValueMetadataMergeStrategy.merge(StrictKeyValueMetadataMergeStrategy.java:36)
        at 
org.apache.parquet.hadoop.metadata.GlobalMetaData.merge(GlobalMetaData.java:106)
        at 
org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:1451)
        at 
org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:1422)
        at 
org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:1383)
        at 
org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:84)
        at 
org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:50)
        at 
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:192)
   ```
   
   We have to implement an output committer for GeoParquet to merge `geo` 
metadata properly. If your usecase do not need to read the geo metadata from 
_common_metadata or _metadata file, we can simply ignore geo metadata when 
generating such files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to