I combine attribute and spatial filters a lot on large parquet files using a 
combination of SetSpatialFilter() and SetAttributeFilter() before querying. 
I've only had some issues with partition elimination which have now been fixed. 
Sometimes the ADBC connection can be faster to query but opening the file with 
gdal.OpenEx() is slower. And ADBC takes more memory. I find the gdal query 
method generally better. 

Having access to the sql functions of duckdb is the only reason I ever use 
ADBC. 

Mike


-- 

Michael Smith 
RSGIS Center – ERDC CRREL NH 
US Army Corps 





On 1/18/26, 11:02 AM, "gdal-dev on behalf of Ari Jolma via gdal-dev" 
<[email protected] <mailto:[email protected]> on 
behalf of [email protected] <mailto:[email protected]>> wrote:


Even Rouault kirjoitti 18.1.2026 klo 16.50:


> Ari,
>
>> I need to read from a large Parquet file (10-20 GB, in S3) features 
>> using a set of user defined constraints that I can parse into 
>> non-spatial SQL and polygon masks. My tests so far show good 
>> performance with a single non-spatial constraint and (separately) 
>> with a bbox. 
>
> Do you mean you get bad performance when setting both 
> SetAttributeFilter() and SetSpatialFilter[Rect]() ? I cannot explain 
> that. Combining them should not be less performant.




No, I'm, just looking for how to best mix spatial and non-spatial 
filters/constraints when retrieving features from a Paquet file using GDAL.




>
> You don't mention if your geoparquet files have a covering bounding 
> box column. For the default WKB encoding, this is essential to avoid 
> full scan of the file.




I don't know about that - will check - but the basic 
SetSpatialFilterRect on a GDAL Python layer works fine.




>
>
>> However, I not sure how to go forward with mixing non-spatial 
>> constraints and perhaps multiple arbitrary polygons (which may be 
>> non-adjacent).
> If you have something like attr_filter && (Intersects(geom, poly1) || 
> Intersects(geom, poly2)) , then you should do separately attr_filter 
> && Intersects(geom, poly1) and then attr_filter && Intersects(geom, 
> poly2)




Ok, so the attr_filter is not expensive even though it is applied twice.




>
>>
>> GDAL SQL docs tell me that with Spatialite built-in I could use 
>> ST_Intersects but does that help with Parquet files? 
> No, because that wouldn't translate as a SetSpatialFilter[Rect]() 
> request, and thus you would get full scan of the file




Ok, I assumed that too.




>
>> How about constructing the non-spatial SQL query first, use that on 
>> dataset, and then use SetSpatialFilterRect on the resulting layer 
>> object possibly multiple times plus ogr.Geometry.Intersects on each 
>> feature coming from the obtained layer? My intuition would tell me to 
>> first do the spatial filtering as that (may) narrow down the search 
>> considerably. But then I cannot use the non-spatial SQL as that 
>> requires a dataset to be executed on.
>
> You could store the result of the spatial request in a temporary 
> dataset (possibly in memory) and then apply the attribute filter. But 
> as said above, I'm a bit surprised that combining the attribute filter 
> and a (single geometry) spatial filter isn't efficient.




Maybe I was not clear on that I'm at this point wondering how to best 
combine the attribute filter and the spatial filter.




>
> Instead of the Parquet driver, you may also try with duckdb and the 
> ADBC driver. The duckdb SQL engine generally outperforms 
> libarrow/libparquet.




Hm, Parquet files are given at this point - I'm doing 
consultancy/development for a client and Parquet is their choice so I 
guess I have developer role now. :)




>
> Even
>


Thanks,


Ari




_______________________________________________
gdal-dev mailing list
[email protected] <mailto:[email protected]>
https://lists.osgeo.org/mailman/listinfo/gdal-dev 
<https://lists.osgeo.org/mailman/listinfo/gdal-dev>




_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to