Le 23/07/2024 à 21:08, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] a écrit :

Excellent, thanks Even!  Do you recall what the runtime was before these changes on your test system?

I killed the process at about half an hour. Don't recall the progress it reached, maybe 40%-50%.

*From: *Even Rouault <even.roua...@spatialys.com>
*Date: *Tuesday, July 23, 2024 at 3:00 PM
*To: *Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] <jesse.r.me...@nasa.gov>, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev <gdal-dev@lists.osgeo.org> *Subject: *[EXTERNAL] Re: [gdal-dev] Expected runtime of polygonize (GDAL 3.9.0) for few very large features.

*CAUTION:*This email originated from outside of NASA.  Please take care when clicking links or opening attachments.  Use the "Report Message" button to report suspicious messages to the NASA SOC.



Hi,

I've got a chance to have a look at your test dataset. In https://github.com/OSGeo/gdal/pull/10477, I've reduced the runtime to 8 minutes (with GeoParquet output, without spatial sorting), by optimizing some implementation details. I believe this could be further reduced as most of the time is still spent in malloc/free of temporary objects (the output is 90 million polygons!) and some objects could be reused, but that would be more extensive changes

Even

Le 01/07/2024 à 18:40, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev a écrit :

    Hi,

    We’ve encountered a few images with what seems like pathological
    performance problems with polygonise.  The details below are a
    report from another developer that I haven’t yet independently
    verified.

    We threshold a raster image to a binary mask in a memory dataset,
    use that as its own mask to mask out the background.

    gdal.Polygonize(nn_mem_band, nn_mem_band, ogr_mem_lyr, -1)

    We have a number of 32k x 32k raster images that feature number of
    very large same-valued regions (some as large as 80% of the entire
    raster).  We’re seeing ~10hrs on a modern workstation to complete
    the line of code above. OpenCV can apparently construct a
    connected components list in mere seconds, on the same workstation
    and image, so we’re considering constructing the OGR geometries
    directly from those as a temporary work around.

    Is this situation a known pitfall with the current algorithm /
    data structures behind Polygonize?

    I’m able to share the problematic tile(s) if of interest,

    Best

    Jesse



    _______________________________________________

    gdal-dev mailing list

    gdal-dev@lists.osgeo.org

    https://lists.osgeo.org/mailman/listinfo/gdal-dev

--
http://www.spatialys.com
My software is free, but my time generally not.

--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
  • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
    • ... Even Rouault via gdal-dev
    • ... Even Rouault via gdal-dev
      • ... Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev
        • ... Even Rouault via gdal-dev

Reply via email to