
I've got a chance to have a look at your test dataset. In https://github.com/OSGeo/gdal/pull/10477, I've reduced the runtime to 8 minutes (with GeoParquet output, without spatial sorting), by optimizing some implementation details. I believe this could be further reduced as most of the time is still spent in malloc/free of temporary objects (the output is 90 million polygons!) and some objects could be reused, but that would be more extensive changes


Le 01/07/2024 à 18:40, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev a écrit :


We’ve encountered a few images with what seems like pathological performance problems with polygonise. The details below are a report from another developer that I haven’t yet independently verified.

We threshold a raster image to a binary mask in a memory dataset, use that as its own mask to mask out the background.

gdal.Polygonize(nn_mem_band, nn_mem_band, ogr_mem_lyr, -1)

We have a number of 32k x 32k raster images that feature number of very large same-valued regions (some as large as 80% of the entire raster).  We’re seeing ~10hrs on a modern workstation to complete the line of code above.  OpenCV can apparently construct a connected components list in mere seconds, on the same workstation and image, so we’re considering constructing the OGR geometries directly from those as a temporary work around.

Is this situation a known pitfall with the current algorithm / data structures behind Polygonize?

I’m able to share the problematic tile(s) if of interest,



gdal-dev mailing list

My software is free, but my time generally not.
gdal-dev mailing list
  • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
    • ... Even Rouault via gdal-dev
    • ... Even Rouault via gdal-dev
      • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
        • ... Even Rouault via gdal-dev

Reply via email to