Hi all,

I need to read from a large Parquet file (10-20 GB, in S3) features using a set of user defined constraints that I can parse into non-spatial SQL and polygon masks. My tests so far show good performance with a single non-spatial constraint and (separately) with a bbox. However, I not sure how to go forward with mixing non-spatial constraints and perhaps multiple arbitrary polygons (which may be non-adjacent).

 GDAL SQL docs tell me that with Spatialite built-in I could use ST_Intersects but does that help with Parquet files? How about constructing the non-spatial SQL query first, use that on dataset, and then use SetSpatialFilterRect on the resulting layer object possibly multiple times plus ogr.Geometry.Intersects on each feature coming from the obtained layer? My intuition would tell me to first do the spatial filtering as that (may) narrow down the search considerably. But then I cannot use the non-spatial SQL as that requires a dataset to be executed on.

The user is actually warned against leaving out the spatial filter as the Parquet files contain millions of features and the selection is any way truncated to max few hundred features.

Any ideas?

Ari


_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to