paleolimbot commented on issue #617:
URL: https://github.com/apache/sedona-db/issues/617#issuecomment-3901979492
You're correct that SedonaDB will push down `WHERE
ST_Intersects(ST_GeomFromWKT('...', 4326))` automatically using bounding box
statistics in the `bbox` column of a GeoParquet file; however, DataFusion
doesn't support pruning on struct column fields so we can't either (it will be
added to the forthcoming DataFusion 53). You can check if pruning happened by
running `EXPLAIN ANALYZE (query)` and checking the bottom right column (usually
I need to collect first for overture because the result is long enough that it
causes display to be truncated).
If the pruning did occur, I suspect that `WHERE
ST_Intersects(ST_GeomFromWKT('...', 4326))` is slower because DuckDB does a
better job caching remote files/metadata/file listings. You can check if the
difference is because of this by using `SET enable_external_file_cache =
false;` in DuckDB. There are some improvements in DataFusion 52 to help there
and also https://github.com/apache/sedona-db/pull/294 , which at the time the
PR was opened didn't help but perhaps does now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]