I've been working in the gdal-dev-env (version 3.1.0, installed around 
mid-December) on OSGeo4w (mostly because it's faster than making COGs using the 
GTIFF driver) on large (e.g. 102600x91100) orthophoto rasters, generating VRTs, 
TIFFs and COGs.

While I can do LZW, DEFLATE, and uncompressed just fine (2 minutes with all 
cores to make a lzw COG from a VRT), I'm struggling to make JPEG COGs. If I run 
a loop, I can't make it through more than one image without gdal_translate 
hanging at the finish for sometimes tens of hours. If I kill the process 
(CTRL-C doesn't always work, but task mgr does) then the resulting COG is fine 
(same size as if I wait n hours and the process finishes). Over the last few 
years I've had this issue (gdal_translate hanging at "100 - done.") on many 
large rasters even when building as TIFF. Also maybe worth noting, even on 
smaller rasters I often see GDAL hang for minutes to tens of minutes at the end 
of a raster build. In the past I was only been building single rasters though, 
so it's not that big of a deal - I can just kill the process. Not any more. I 
frequently build several at a time and hope to scale up.

I'm running on a threadripper 3960x with 256GB RAM that I built. All processing 
is on a NVMe drive. The LZW compressed tiffs (COGs) are around 1.5 - 3GB 
(8-bit,RGB with mask band). If I build with CPL_DEBUG=ON, depending on cachemax 
size, I see "potential thrashing on band one of ." at around 10-20% (even with 
GDAL_CACHEMAX at 80%), and if not set high enough I'm stuck at 20% for hours 
and hours. Then gdal hangs at "100 - done." for anywhere from 2 - 12+ hours 
unless I kill it. If I kill the process, the final raster builds out and 
appears to work fine, and is the same as if I wait X hours for it to exit. For 
a test with debug on I just finished, after 2.5h hung at "done" I got this line:

GDAL: GDALClose(<outfile.tif.ovr.tmp, this=000001FDC5531C50)

And another 45 minutes later the input and output tiffs closed and shared 
library unloaded after the RAM slowly emptied from ~30 gig over that time.

My overall command at the moment is:

gdal_translate .\<infile.tif> <outfile.tif> -of COG -co COMPRESS=JPEG -co 
QUALITY=90 -config GDAL_CACHEMAX "80%" -config GDAL_SWATH_SIZE "80%" -config 
GDAL_FORCE_CACHING YES -config GDAL_MAX_DATASET_POOL_SIZE 2048

And with lower values (and possibly if I get rid of the GDAL_FORCE_CACHING YES 
variable - I just added that) I have the same "hang" at 100% lasting for even 
longer. Again, the same COG builds in 2 minutes with LZW, but with JPEG and all 
the cachemax settings ramped up, it takes maybe 6 hours.

_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to