Hi Pete,

Those are good questions

I am confused by the large number of curl range requests when using the new multithreaded reading.  Some questions:

- with GDAL_NUM_THREADS_1 and GDAL_HTTP_MULTIRANGE=YES, "each range will be requested in parallel, using several HTTP connections"... are those requests multithreaded?
Not multi-threaded, but using Curl multi handle interface (https://curl.se/libcurl/c/libcurl-multi.html), which enables to start several connections in parallel within the same user-space thread (not sure what the kernel/OS does behinds the scenes) and listen to the corresponding file descriptors/network handles to collect responses as soon as they arive. Thus, the GeoTIFF driver waits for all those queued requests to have returned their result to continue its processing.
- Is it a bad idea to use multithreaded reads and GDAL_HTTP_MULTIRANGE=YES when data is accessed with /vsicurl/ served by HTTP/1.1?  I am guessing the GTiff multithreaded reads are splitting up contiguous byte ranges to be noncontiguous, which may yield worse performance on some virtual filesystems.

What you've experienced with multithreaded reads and HTTP reads falls into the suggestion of https://github.com/OSGeo/gdal/issues/6456. So basically the multi-threaded optimization works best for now to read local files. The choice of reading small_world.tif over network is a bit a worst case here as it is definitely not a cloud optimized files, having a strip organization, and each strip being only 8 KB large. So the absence of range merging currently in the multithreaded GTiff decoding code path particularly hurts for that use case. For normally tiled files, it shouldn't be that bad.

Even

--

http://www.spatialys.com
My software is free, but my time generally not.

_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to