Ari,

I need to read from a large Parquet file (10-20 GB, in S3) features using a set of user defined constraints that I can parse into non-spatial SQL and polygon masks. My tests so far show good performance with a single non-spatial constraint and (separately) with a bbox.

Do you mean you get bad performance when setting both SetAttributeFilter() and SetSpatialFilter[Rect]() ? I cannot explain that. Combining them should not be less performant.

You don't mention if your geoparquet files have a covering bounding box column. For the default WKB encoding, this is essential to avoid full scan of the file.

However, I not sure how to go forward with mixing non-spatial constraints and perhaps multiple arbitrary polygons (which may be non-adjacent).
If you have something like attr_filter && (Intersects(geom, poly1) || Intersects(geom, poly2))  , then you should do separately  attr_filter && Intersects(geom, poly1)   and then attr_filter && Intersects(geom, poly2)

 GDAL SQL docs tell me that with Spatialite built-in I could use ST_Intersects but does that help with Parquet files?
No, because that wouldn't translate as a SetSpatialFilter[Rect]() request, and thus you would get full scan of the file
How about constructing the non-spatial SQL query first, use that on dataset, and then use SetSpatialFilterRect on the resulting layer object possibly multiple times plus ogr.Geometry.Intersects on each feature coming from the obtained layer? My intuition would tell me to first do the spatial filtering as that (may) narrow down the search considerably. But then I cannot use the non-spatial SQL as that requires a dataset to be executed on.

You could store the result of the spatial request in a temporary dataset (possibly in memory) and then apply the attribute filter. But as said above, I'm a bit surprised that combining the attribute filter and a (single geometry) spatial filter isn't efficient.

Instead of the Parquet driver, you may also try with duckdb and the ADBC driver. The duckdb SQL engine generally outperforms libarrow/libparquet.

Even

--
http://www.spatialys.com
My software is free, but my time generally not.

_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to