Hi,
I've got a chance to have a look at your test dataset. In
https://github.com/OSGeo/gdal/pull/10477, I've reduced the runtime to 8
minutes (with GeoParquet output, without spatial sorting), by optimizing
some implementation details. I believe this could be further reduced as
most of the time is still spent in malloc/free of temporary objects (the
output is 90 million polygons!) and some objects could be reused, but
that would be more extensive changes
Even
Le 01/07/2024 à 18:40, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND
APPLICATIONS INC] via gdal-dev a écrit :
Hi,
We’ve encountered a few images with what seems like pathological
performance problems with polygonise. The details below are a report
from another developer that I haven’t yet independently verified.
We threshold a raster image to a binary mask in a memory dataset, use
that as its own mask to mask out the background.
gdal.Polygonize(nn_mem_band, nn_mem_band, ogr_mem_lyr, -1)
We have a number of 32k x 32k raster images that feature number of
very large same-valued regions (some as large as 80% of the entire
raster). We’re seeing ~10hrs on a modern workstation to complete the
line of code above. OpenCV can apparently construct a connected
components list in mere seconds, on the same workstation and image, so
we’re considering constructing the OGR geometries directly from those
as a temporary work around.
Is this situation a known pitfall with the current algorithm / data
structures behind Polygonize?
I’m able to share the problematic tile(s) if of interest,
Best
Jesse
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev