Kontinuation opened a new pull request, #611:
URL: https://github.com/apache/sedona-db/pull/611
## Summary
KNN joins have different semantics than regular spatial joins — pushing
filters to the object (build) side changes which objects are the k nearest
neighbors, producing incorrect results. DataFusion's builtin `PushDownFilter`
optimizer rule doesn't know this and incorrectly pushes filters through KNN
joins.
This PR adds a `KnnJoinEarlyRewrite` optimizer rule that converts KNN joins
to `SpatialJoinPlanNode` extension nodes **before** DataFusion's
`PushDownFilter` rule runs. Extension nodes naturally block filter pushdown via
`prevent_predicate_push_down_columns()` returning all columns.
## Changes
- **New `KnnJoinEarlyRewrite` optimizer rule** — handles two patterns:
1. `Join(filter=ST_KNN(...))` — when the ON clause has only the spatial
predicate
2. `Filter(ST_KNN(...), Join(on=[...]))` — when the ON clause also has
equi-join conditions (DataFusion's SQL planner separates these)
- **Positional rule insertion** — `MergeSpatialProjectionIntoJoin` and
`KnnJoinEarlyRewrite` are inserted before `PushDownFilter`, while
`SpatialJoinLogicalRewrite` (for non-KNN joins) remains after so non-KNN joins
still benefit from filter pushdown
- **Updated `SpatialJoinLogicalRewrite`** — skips KNN joins (already handled
by the early rewrite)
- **Integration tests** verifying that object-side filters are NOT pushed
down for KNN joins, but ARE pushed down for non-KNN spatial joins
## Rule ordering
```
... → MergeSpatialProjectionIntoJoin → KnnJoinEarlyRewrite → PushDownFilter
→ ... → SpatialJoinLogicalRewrite
```
Closes https://github.com/apache/sedona-db/issues/605
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]