Hi, I am experimenting with curl range requests on servers supporting HTTP/1.1 and the new GTiff multi-threaded read released in gdal-3.6.0 from https://github.com/OSGeo/gdal/pull/6438.
First, here's a command that uses single-threaded reads with HTTP/1.1. Note the default value https://trac.osgeo.org/gdal/wiki/ConfigOptions#GDAL_HTTP_MULTIRANGE =YES. For HTTP/1.1, I expect each range will be requested in parallel, using several HTTP connections. env GDAL_NUM_THREADS=1 CPL_CURL_VERBOSE=1 GDAL_HTTP_VERSION=1.1 GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=tif gdal_translate /vsicurl/ https://github.com/OSGeo/gdal/raw/master/autotest/gdrivers/data/small_world.tif small_world_jpeg.tif -co tiled=yes -co compress=jpeg -co photometric=ycbcr 2>&1 | grep "Content-Range" < Content-Range: bytes 0-16383/240574 < Content-Range: bytes 229376-240573/240574 < Content-Range: bytes 16384-81919/240574 < Content-Range: bytes 81920-212991/240574 < Content-Range: bytes 212992-229375/240574 I try the same thing using 2 threads. It is quite a bit slower and there are much more range requests than I would anticipate: time env GDAL_NUM_THREADS=2 CPL_CURL_VERBOSE=1 GDAL_HTTP_VERSION=1.1 GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=tif gdal_translate /vsicurl/ https://github.com/OSGeo/gdal/raw/master/autotest/gdrivers/data/small_world.tif small_world_jpeg_multi.tif -co tiled=yes -co compress=jpeg -co photometric=ycbcr 2>&1 | grep "Content-Range" < Content-Range: bytes 0-16383/240574 < Content-Range: bytes 229376-240573/240574 < Content-Range: bytes 232008-240007/240574 < Content-Range: bytes 88008-96007/240574 < Content-Range: bytes 152008-160007/240574 < Content-Range: bytes 72008-80007/240574 < Content-Range: bytes 224008-232007/240574 < Content-Range: bytes 144008-152007/240574 < Content-Range: bytes 64008-72007/240574 < Content-Range: bytes 216008-224007/240574 < Content-Range: bytes 136008-144007/240574 < Content-Range: bytes 56008-64007/240574 < Content-Range: bytes 208008-216007/240574 < Content-Range: bytes 128008-136007/240574 < Content-Range: bytes 48008-56007/240574 < Content-Range: bytes 200008-208007/240574 < Content-Range: bytes 120008-128007/240574 < Content-Range: bytes 40008-48007/240574 < Content-Range: bytes 192008-200007/240574 < Content-Range: bytes 112008-120007/240574 < Content-Range: bytes 32008-40007/240574 < Content-Range: bytes 184008-192007/240574 < Content-Range: bytes 104008-112007/240574 < Content-Range: bytes 24008-32007/240574 < Content-Range: bytes 176008-184007/240574 < Content-Range: bytes 96008-104007/240574 < Content-Range: bytes 16008-24007/240574 < Content-Range: bytes 168008-176007/240574 < Content-Range: bytes 8008-16007/240574 < Content-Range: bytes 160008-168007/240574 < Content-Range: bytes 80008-88007/240574 < Content-Range: bytes 8-8007/240574 I am confused by the large number of curl range requests when using the new multithreaded reading. Some questions: - with GDAL_NUM_THREADS_1 and GDAL_HTTP_MULTIRANGE=YES, "each range will be requested in parallel, using several HTTP connections"... are those requests multithreaded? - Is it a bad idea to use multithreaded reads and GDAL_HTTP_MULTIRANGE=YES when data is accessed with /vsicurl/ served by HTTP/1.1? I am guessing the GTiff multithreaded reads are splitting up contiguous byte ranges to be noncontiguous, which may yield worse performance on some virtual filesystems. Thanks, Pete
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev