Thanks for trying out accessing FlatGeobuf via http. For the record I've been slightly aware of this particular efficiency problem and I aim to improve it when I can get to it, because this is a use case I definitely want FlatGeobuf to grab the first place. :)
/Björn Den tors 24 okt. 2019 kl 20:05 skrev Even Rouault < even.roua...@spatialys.com>: > On jeudi 24 octobre 2019 17:42:23 CEST Rahkonen Jukka (MML) wrote: > > Hi, > > > > I was experimenting with accessing some vector files through http (same > data > > as FlatGeoBuffers, GeoPackage, and shapefile). The file size in each > format > > was about 850 MB and the amount of data was about 240000 linestrings. I > > made ogrinfo request with spatial filter that selects one feature and > > cheched the number of http requests and amount of requested data. > > > > FlatGeoBuffers > > 19 http requests > > 33046509 bytes read > > Looking at the debug log, FlatGeoBuf currently loads the whole index-of- > features array( "Reading feature offsets index" ), which accounts for 32.7 > MB > of the above 33 MB. This could probably be avoided by only loading the > offsets > of the selected features. The shapefile driver a few years ago had the > same > issue and this was fixed by initializing the offset array to zeroes, and > load > on demand the offsets when needed. > > > If somebody > > really finds a use case for reading vector data from the web it seems > > obvious that having a possibility to cache and re-use the spatial index > > would be very beneficial. I can imagine that with shapefile it would > mean > > downloading the .qix file, with GeoPackage reading the contents of the > > rtree index table, and with FlatGeoBuffers probably extracting the Static > > packed Hilbert R-tree index. > > A general caching logic in /vsicurl/ would be preferable (although the > download of the 'data' part of files might potentially evict the indexes, > but > having a dedicated logic in each driver to tell which files / region of > the > files should be cached would be a bit annoying). Basically doing a HEAD > request on the file to get its last update date, and have a local cache of > downloaded pieces would be a more general solution. > > Even > > -- > Spatialys - Geospatial professional services > http://www.spatialys.com >
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev