Hi Pete,
Those are good questions
I am confused by the large number of curl range requests when using
the new multithreaded reading. Some questions:
- with GDAL_NUM_THREADS_1 and GDAL_HTTP_MULTIRANGE=YES, "each range
will be requested in parallel, using several HTTP connections"... are
those requests multithreaded?
Not multi-threaded, but using Curl multi handle interface
(https://curl.se/libcurl/c/libcurl-multi.html), which enables to start
several connections in parallel within the same user-space thread (not
sure what the kernel/OS does behinds the scenes) and listen to the
corresponding file descriptors/network handles to collect responses as
soon as they arive. Thus, the GeoTIFF driver waits for all those queued
requests to have returned their result to continue its processing.
- Is it a bad idea to use multithreaded reads and
GDAL_HTTP_MULTIRANGE=YES when data is accessed with /vsicurl/ served
by HTTP/1.1? I am guessing the GTiff multithreaded reads are
splitting up contiguous byte ranges to be noncontiguous, which may
yield worse performance on some virtual filesystems.
What you've experienced with multithreaded reads and HTTP reads falls
into the suggestion of https://github.com/OSGeo/gdal/issues/6456. So
basically the multi-threaded optimization works best for now to read
local files. The choice of reading small_world.tif over network is a bit
a worst case here as it is definitely not a cloud optimized files,
having a strip organization, and each strip being only 8 KB large. So
the absence of range merging currently in the multithreaded GTiff
decoding code path particularly hurts for that use case. For normally
tiled files, it shouldn't be that bad.
Even
--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev