Re: [gdal-dev] Thread-safe raster access

2024-06-05 Thread Andrew Bell via gdal-dev
Hi all,

Thanks for the thoughts.

>From an algorithmic perspective, it's frustrating to have to wait to
read/write data. There are ways, up to a point, to avoid some of this with
the current interface, but it can make applications complicated and for
algorithms that can process blocks separately, having to wait on
reads/writes is unfortunate. I understand that some drivers will be
constrained due to the provided APIs, but I don't know that this needs to
hold back a more general implementation.

My thinking at this time is that some other interface is preferable to
trying to making the existing code thread-safe. It might make it easier to
write threaded applications as well. The current band model that allows
mixed reads and writes seems like it would complicate things. I wonder if
there are many applications that both read from and write to the same
raster.

GDAL is a long-lived system and it does many, many things well and provides
lots of options. Perhaps stripping away a certain amount of the current
flexibility would allow much of the driver code to be reused and progress
to be made in a more timely manner. The code doesn't have to do everything
to be useful.

Anyway, if you have further ideas, feel free to write me.

Thanks,


On Wed, Jun 5, 2024 at 4:31 PM Deyan Vasilev  wrote:

> Hi,
> I've written a piece of software that fetches tiles out of a single
> MBTiles raster done by multiple threads. Tiles go to a common cache
> which can be used by a "view" thread that assembles tiles on the
> screen in a seamless map.
>
> Each fetcher thread uses GDALOpenEx() and then RasterIO(GF_Read,,,) to
> get individual RGB bands out of the stored PNG tiles in the source
> MBTiles db. On the lowest level
> GDALGPKGMBTilesLikePseudoDataset::ReadTile() does a sqlite3 "SELECT
> tile_data FROM tiles..." with a db context which is unique to every
> fetcher thread.
>
> Giving a recap, a multithreaded raster read on a single raster source
> is doable, eased probably by the sqlite raster container that I use in
> my app.
>
> Best regards,
> Deyan
>
>
>
>
> On Mon, Jun 3, 2024 at 8:04 PM Even Rouault via gdal-dev
>  wrote:
> >
> > Andrew,
> >
> > what would be the purpose of thread-safe access: just making it
> thread-safe without any particular requirement on how efficient this would
> be (1), or hope for true concurrent access with ideally close to linear
> scalability with the number of threads (2) ?
> >
> > If (1), then we could add a GDALMutexedDataset class, similarly to
> https://github.com/OSGeo/gdal/blob/master/ogr/ogrsf_frmts/generic/ogrmutexeddatasource.h
> which exists on the vector side (just used by the FileGDB driver due to the
> fact that the underlying SDK is not even re-entrant), which uses the
> decorator pattern around all public API entry points to call the underlying
> dataset under a mutex.  One could imagine to have a GDAL_OF_THREADSAFE open
> flag that GDALOpen() would use to return such instance. Shouldn't be too
> hard to implement, but probably not that useful IMHO. I can anticipate most
> users would have higher expectations than a mutex-based implementation.
> >
> > If (2), it seems to me that it would require a huge effort, and the
> programming language we use (C++) offers hardly any safety belt to make
> sure we don't make mistakes, the main one being forgetting to lock things
> that should be locked, or dead locks situation. If we go into doing that,
> I'm not even sure how we can reliably identify all parts of the code that
> must be modified
> >
> > Neither GDAL raster core nor any driver are designed to be thread-safe.
> For core, at least gcore/gdalarraybandblockcache.cpp and
> gcore/gdalhashsetbandblockcache.cpp which interact with the block cache
> should be made thread-safe, and "just" adding a lock would defeat the aim
> to achieve linear scalability. The change in GDALDataset::RasterIO() I did
> in
> https://github.com/OSGeo/gdal/commit/7f3a0e582eb189744bc7cb8e4a751135edaecaf5
> isn't thread-safe either (would be easy to make thread-safe though)
> >
> > Once GDAL raster code is ready, the main challenge is making drivers
> themselves thread-safe. Raster drivers may directly read from a VSILFILE*
> handle, which isn't thread safe when using the standard Seek() + Read()
> pair. A few VSIVirtualFileSystem have a PRead() implementation, which is
> thread-safe, but not all). Or they rely on using some instance of a
> "reader" returned by a third-party library (libtiff, libjpeg, libpng,
> sqlite3, etc.) (which in most cases also uses a VSILFILE*), none of which
> are thread-safe (except sqlite3 that can be made thread-safe by passing a
> flag at sqlite3_open() time, that will basically applies strategy (1) by
> protecting all calls with a mutex). Perhaps using thread-specific instances
> of VSILFILE* and third-party "reader" objects could be a way of solving
> this. But realistically doing a pass in all GDAL drivers would be a
> multi-month-man to multi-year-man type of effort. A 

Re: [gdal-dev] Thread-safe raster access

2024-06-05 Thread Deyan Vasilev via gdal-dev
Hi,
I've written a piece of software that fetches tiles out of a single
MBTiles raster done by multiple threads. Tiles go to a common cache
which can be used by a "view" thread that assembles tiles on the
screen in a seamless map.

Each fetcher thread uses GDALOpenEx() and then RasterIO(GF_Read,,,) to
get individual RGB bands out of the stored PNG tiles in the source
MBTiles db. On the lowest level
GDALGPKGMBTilesLikePseudoDataset::ReadTile() does a sqlite3 "SELECT
tile_data FROM tiles..." with a db context which is unique to every
fetcher thread.

Giving a recap, a multithreaded raster read on a single raster source
is doable, eased probably by the sqlite raster container that I use in
my app.

Best regards,
Deyan




On Mon, Jun 3, 2024 at 8:04 PM Even Rouault via gdal-dev
 wrote:
>
> Andrew,
>
> what would be the purpose of thread-safe access: just making it thread-safe 
> without any particular requirement on how efficient this would be (1), or 
> hope for true concurrent access with ideally close to linear scalability with 
> the number of threads (2) ?
>
> If (1), then we could add a GDALMutexedDataset class, similarly to 
> https://github.com/OSGeo/gdal/blob/master/ogr/ogrsf_frmts/generic/ogrmutexeddatasource.h
>  which exists on the vector side (just used by the FileGDB driver due to the 
> fact that the underlying SDK is not even re-entrant), which uses the 
> decorator pattern around all public API entry points to call the underlying 
> dataset under a mutex.  One could imagine to have a GDAL_OF_THREADSAFE open 
> flag that GDALOpen() would use to return such instance. Shouldn't be too hard 
> to implement, but probably not that useful IMHO. I can anticipate most users 
> would have higher expectations than a mutex-based implementation.
>
> If (2), it seems to me that it would require a huge effort, and the 
> programming language we use (C++) offers hardly any safety belt to make sure 
> we don't make mistakes, the main one being forgetting to lock things that 
> should be locked, or dead locks situation. If we go into doing that, I'm not 
> even sure how we can reliably identify all parts of the code that must be 
> modified
>
> Neither GDAL raster core nor any driver are designed to be thread-safe. For 
> core, at least gcore/gdalarraybandblockcache.cpp and 
> gcore/gdalhashsetbandblockcache.cpp which interact with the block cache 
> should be made thread-safe, and "just" adding a lock would defeat the aim to 
> achieve linear scalability. The change in GDALDataset::RasterIO() I did in 
> https://github.com/OSGeo/gdal/commit/7f3a0e582eb189744bc7cb8e4a751135edaecaf5 
> isn't thread-safe either (would be easy to make thread-safe though)
>
> Once GDAL raster code is ready, the main challenge is making drivers 
> themselves thread-safe. Raster drivers may directly read from a VSILFILE* 
> handle, which isn't thread safe when using the standard Seek() + Read() pair. 
> A few VSIVirtualFileSystem have a PRead() implementation, which is 
> thread-safe, but not all). Or they rely on using some instance of a "reader" 
> returned by a third-party library (libtiff, libjpeg, libpng, sqlite3, etc.) 
> (which in most cases also uses a VSILFILE*), none of which are thread-safe 
> (except sqlite3 that can be made thread-safe by passing a flag at 
> sqlite3_open() time, that will basically applies strategy (1) by protecting 
> all calls with a mutex). Perhaps using thread-specific instances of VSILFILE* 
> and third-party "reader" objects could be a way of solving this. But 
> realistically doing a pass in all GDAL drivers would be a multi-month-man to 
> multi-year-man type of effort. A realistic plan should be designed to allow 
> combining (1) and (2): (2) for a few select drivers, and (1) as a fallback 
> for most drivers that wouldn't be updated.
>
> Even
>
> Le 03/06/2024 à 15:44, Andrew Bell via gdal-dev a écrit :
>
> Hi,
>
> I am aware that there isn't thread-safe raster access with the current GDAL 
> interface for various reasons. Given the state of processors, I was wondering 
> if it would be valuable to take a look at providing the ability to do Raster 
> I/O (at least reads) in a thread-safe way. This could be done through a new 
> set of API calls or perhaps by modifications to what currently exists -- I 
> don't know what makes sense at this point. I would be happy to spend some 
> time looking at this if there is interest, but I would also like to learn 
> from existing experience as to what kinds of things that I'm surely not 
> considering would have to be dealt with.
>
> Thanks,
>
> --
> Andrew Bell
> andrew.bell...@gmail.com
>
> ___
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> --
> http://www.spatialys.com
> My software is free, but my time generally not.
>
> ___
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org
> 

Re: [gdal-dev] gdaladdo slowness for VRTs

2024-06-05 Thread Rahkonen Jukka via gdal-dev
Hi,

A better reference file can be created by materializing the VRT. The one 
created by gdal_create has the same raster values on all the pixels but the 
materialized one contains the same original image data. Maybe in your test case 
with uncompressed outputs the difference is not so big, but please test it 
anyway.

gdal_translate -of GTiff -co tiled=yes -co compress=LZW input.vrt output.tif
gdaladdo -ro output.tif

What is the difference in speed now? Ten times slower performance with vrt 
feels quite a lot.

-Jukka Rahkonen-


Lähettäjä: gdal-dev  Puolesta Denis Rykov via 
gdal-dev
Lähetetty: keskiviikko 5. kesäkuuta 2024 4.36
Vastaanottaja: gdal dev 
Aihe: [gdal-dev] gdaladdo slowness for VRTs

Hi,

I spotted a slow calculation of overviews and I'm wondering what could be the 
reason.

$ gdal_create in.tif -if 20240602_230818_SN26_RR_VISUAL_MS.vrt
$ time gdaladdo -ro in.tif
gdaladdo -ro in.tif  4,25s user 6,23s system 34% cpu 30,623 total

But on the file of the same size and num of bands but VRT it takes much more 
time:

$ time gdaladdo -ro 20240602_230818_SN26_RR_VISUAL_MS.vrt
gdaladdo -ro   55,36s user 5,06s system 44% cpu 2:14,79 total

Here is the output of gdalinfo of the VRT:

$ gdalinfo 20240602_230818_SN26_RR_VISUAL_MS.vrt
Driver: VRT/Virtual Raster
Files: 20240602_230818_SN26_RR_VISUAL_MS.vrt
   20240602_230818_SN26_RR_VISUAL_MS_340_5020.tif
   20240602_230818_SN26_RR_VISUAL_MS_340_5040.tif
   20240602_230818_SN26_RR_VISUAL_MS_360_5020.tif
   20240602_230818_SN26_RR_VISUAL_MS_360_5040.tif
   20240602_230818_SN26_RR_VISUAL_MS_360_5060.tif
Size is 20779, 46754
Coordinate System is:
PROJCRS["WGS 84 / UTM zone 59S",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]],
CONVERSION["UTM zone 59S",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",171,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",50,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",1000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["easting",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["northing",north,
ORDER[2],
LENGTHUNIT["metre",1]],
ID["EPSG",32759]]
Data axis to CRS axis mapping: 1,2
Origin = (353999.7988358,5068000.62999888241)
Pixel Size = (0.770,-0.770)
Corner Coordinates:
Upper Left  (  353999.800, 5068000.630) (169d 9'45.50"E, 44d31'35.55"S)
Lower Left  (  353999.800, 5032000.050) (169d 9' 8.52"E, 44d51' 1.68"S)
Upper Right (  36.630, 5068000.630) (169d21'50.06"E, 44d31'46.58"S)
Lower Right (  36.630, 5032000.050) (169d21'17.12"E, 44d51'12.83"S)
Center  (  361999.715, 505.340) (169d15'30.36"E, 44d41'24.33"S)
Band 1 Block=128x128 Type=Byte, ColorInterp=Red
  NoData Value=0
Band 2 Block=128x128 Type=Byte, ColorInterp=Green
  NoData Value=0
Band 3 Block=128x128 Type=Byte, ColorInterp=Blue
  NoData Value=0
Band 4 Block=128x128 Type=Byte, ColorInterp=Undefined
  NoData Value=0

And gdalinfo output for one of the underlying rasters:

gdalinfo 20240602_230818_SN26_RR_VISUAL_MS_360_5040.tif
Driver: GTiff/GeoTIFF
Files: 20240602_230818_SN26_RR_VISUAL_MS_360_5040.tif
Size is 10390, 25974
Coordinate System is:
PROJCRS["WGS 84 / UTM zone 59S",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]],
CONVERSION["UTM zone 59S",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",171,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",50,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",1000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],