Re: [gdal-dev] Reading from (geo)parquet using mixed spatia and non-spatiall filters

Ari Jolma via gdal-dev Fri, 23 Jan 2026 03:03:40 -0800

Hi Jukka,

Yes, GPKG works pretty well with vsis3 too but Parquet will probably bethe choice of my client.


Ari

Rahkonen Jukka kirjoitti 23.1.2026 klo 10.15:

Hi Ari,

But don't you know the answer already? You wrote:
"the feature is retrieved from GPKG really fast (also if the file is in S3)"
GeoPackage is not listed as a cloud optimized format, but actually with GDAL it 
does work pretty well with vsicurl.

-Jukka Rahkonen-

________________________________________
Lähettäjä: gdal-dev <[email protected]> käyttäjän Ari Jolma via 
gdal-dev <[email protected]> puolesta
Lähetetty: Perjantai 23. tammikuuta 2026 9.32
Vastaanottaja: Even Rouault <[email protected]>; Michael Smith 
<[email protected]>; [email protected] <[email protected]>
Aihe: Re: [gdal-dev] Reading from (geo)parquet using mixed spatia and 
non-spatiall filters


Thanks Even,

Attribute filter fid = <fid> seems fast but ID = <ID> is not fast. Hm,
the whole idea is to use files in S3 instead of data in AWS RDS as the
data is static and RDS costs are high (i.e., it's not a technical
reason). Our use case is mostly about bbox searches and then extracting
single feature or doing a simple mixed spatial and attribute search.

Ari

Even Rouault kirjoitti 22.1.2026 klo 22.09:

Hi Ari,

Looking at the code, I see the driver does read all row groups whereas
it could potentially be improved to use row group level statistics to
skip all of them but the one matching. That said you can probably
workaround the issue by using instead SetAttributeFilter("fid =
<the-fid>")    , or querying directly the ID column if that's your
ultimate objective.

More generally Parquet shines more at requesting a significant amount
of data / bulk loading scenarios than just extracting a single feature
where you'll get better performance with regular databases with proper
indices built.

Even

Le 22/01/2026 à 12:49, Ari Jolma a écrit :

Thanks for the replies. I'm progressing but now I hit something I
don't understand.

I have a large GPKG file which I converted into a Parquet file. If I
now do a simple layer.GetFeature(fid) on a random fid on the layer,
the feature is retrieved from GPKG really fast (also if the file is
in S3) but from Parquet it is slow (~ 20 secs) even on local filesystem.

On both files layer.GetFIDColumn() reports "fid". There is a native
"ID" column on the GPKG but fid <> ID.

I used ogr2ogr to create the Parquet file. I had -lco COMPRESSION=None

Ari

Michael Smith kirjoitti 18.1.2026 klo 18.09:

I combine attribute and spatial filters a lot on large parquet files
using a combination of SetSpatialFilter() and SetAttributeFilter()
before querying. I've only had some issues with partition
elimination which have now been fixed. Sometimes the ADBC connection
can be faster to query but opening the file with gdal.OpenEx() is
slower. And ADBC takes more memory. I find the gdal query method
generally better.

Having access to the sql functions of duckdb is the only reason I
ever use ADBC.

Mike

_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

_______________________________________________
gdal-dev mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Reading from (geo)parquet using mixed spatia and non-spatiall filters

Reply via email to