Hi, R*Tree is standard SQLite module, so maybe it is best to look at the SQLite documentation https://www.sqlite.org/rtree.html. Finding Spatialite documents from the website is always difficult (but it is Open Source and anybody can offer their help). Spatialite documents contain some background information and examples and they are worth reading. You could start from a historic https://www.gaia-gis.it/gaia-sins/spatialite-tutorial-2.3.1.html#t8. The current documentation about using spatial index with Spatialite is https://www.gaia-gis.it/fossil/libspatialite/wiki?name=SpatialIndex, but it describes the usage of the alternative way to access R*Tree tables with VirtualSpatialIndex https://www.gaia-gis.it/gaia-sins/SpatialIndex-Update.pdf that works only with Spatialite databases. With GeoPackage R*Tree tables must be queried directly.
-Jukka Rahkonen- ________________________________________ Lähettäjä: gdal-dev <[email protected]> käyttäjän Even Rouault via gdal-dev <[email protected]> puolesta Lähetetty: Torstai 25. syyskuuta 2025 3.01 Vastaanottaja: Tim Harris <[email protected]>; gdal dev <[email protected]> Aihe: Re: [gdal-dev] Large GPKG files in S3 Hi Tim, yes you're running into a well known s/limitation/feature/ of spatial SQLite (I'm pretty sure that the Spatialite website has one page about that, but can't find it right now). The use of the ST_ functions doesn't trigger the RTree automatically, and you need to query it explicitly by joining with the rtree_{table_name}_{geom_column_name} virtual table. For example the following completes quite quickly: $ ogrinfo -ro -al -q /vsicurl/https://storage.googleapis.com/open-geodata/linz-examples/nz-building-outlines.gpkg -sql "SELECT * FROM nz_building_outlines p JOIN rtree_nz_building_outlines_geom rt ON p.rowid = rt.id WHERE rt.minx < 1771150 and rt.maxx > 1771140 and rt.miny < 5894877 and rt.maxy > 5894870 and ST_Intersects(p.geom, ST_GeomFromText('POINT(1771141 5894876)')) = 1" Layer name: SELECT OGRFeature(SELECT):11471 building_id (Integer) = 4199821 name (String) = use (String) = Unknown suburb_locality (String) = Karaka town_city (String) = territorial_authority (String) = Auckland capture_method (String) = Feature Extraction capture_source_group (String) = NZ Aerial Imagery capture_source_id (Integer) = 1014 capture_source_name (String) = Auckland 0.075m Urban Aerial Photos (2017) capture_source_from (Date) = 2017/03/15 capture_source_to (Date) = 2017/05/06 last_modified (Date) = 2019/04/05 id (Integer64) = 11471 minx (Real) = 1771139.875 maxx (Real) = 1771152 miny (Real) = 5894865.5 maxy (Real) = 5894883 MULTIPOLYGON (((1771139.91348946 5894880.21926274,1771146.68318411 5894882.80032223,1771151.97657734 5894868.91662014,1771145.02123927 5894866.26478195,1771143.49962591 5894870.255725,1771142.39909482 5894869.83612922,1771140.01981817 5894876.07658206,1771141.30599197 5894876.56695731,1771139.91348946 5894880.21926274))) Even Le 25/09/2025 à 01:38, Tim Harris via gdal-dev a écrit : > Hi, trying to do some queries of fairly large GPKG files sitting in S3 > in the hopes that spatial indexes and S3 range requests make it fast > and efficient. But it seems that's not the case and I'm wondering if > I'm doing something wrong. > > The examples below are just looking for a geometry at a point. The > filename is fake but the geoms.gpkg contains a layer "geoms" with a > bunch of polygons. It's about 1.7 GB in size. > > Something like this runs relatively fast, it makes about 17 HTTP > requests and outputs the feature: > > --- > ogrinfo -ro -json -features /vsis3/bucket/geoms.gpkg geoms -spat -105 > 40 -105 40 > --- > > But this, using -sql instead of -spat, sits forever without finishing. > With CPL_CURL_VERBOSE it sort of looks like it's just scanning through > the file a range request at a time: > > --- > ogrinfo -ro -json -features /vsis3/bucket/geoms.gpkg geoms -sql > "select * from geoms where ST_Intersects(geom, > ST_GeomFromText('POINT(-105 40)', 4326)) = 1" > --- > > I also tried replacing the ST_GeomFromText with other things > like MakePoint(-105, 40, 4326) and ST_EnvIntersects(geom, -105, 40, > -105, 40), just in case. It seems like -spat uses the spatial index > efficiently but -sql doesn't. I also tried using them both, but having > the -sql there seems to make it slow. > > I also tried to add "explain query plan" to the sql statement: > > --- > ogrinfo -ro > /vsis3/prod-tilerepo-va2/metadata/20250920T041443/geoms.gpkg geoms > -sql "EXPLAIN QUERY PLAN select * from geoms where ST_Intersects(geom, > ST_GeomFromText('POINT(-105 40)', 4326)) = 1" > --- > > That only sort of confirms the scan, and reports "SCAN geoms" in the > output. This page, https://sqlite.org/eqp.html, explains the "explain > query plan" output and I was hoping to see a "SEARCH geoms USING INDEX > ___" of some sort. > > Is there something about the use of -sql that precludes it from using > the spatial index? Maybe we need to do an ogr2ogr with just -spat to > narrow it down, then apply a -sql to that result if we need additional > filtering? > > Thanks > > _______________________________________________ > gdal-dev mailing list > [email protected] > https://lists.osgeo.org/mailman/listinfo/gdal-dev -- http://www.spatialys.com/ My software is free, but my time generally not. _______________________________________________ gdal-dev mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/gdal-dev _______________________________________________ gdal-dev mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/gdal-dev
