alamb commented on issue #15037:
URL: https://github.com/apache/datafusion/issues/15037#issuecomment-2734575745
> Does anyone have a handle on how we might implement this? I was thinking
we’d need to add a method to exec operators called `apply_filter` but that
basically sends down the additional filter and by default it gets forwarded to
children until it hits an exec that knows what to do with it (eg
DataSourceExec). But I’m not very clear beyond that.
To begin with I would suggest:
1. Make a new PhysicalExpr named something like `TopKRuntimeFilter`
2. Add a physical optimizer pass that runs after all other passes (so the
structure doesn't change) that finds `TopK` nodes and tries to find connected
Scans the (start with some basic rules, don't try and go past joins, etc)
3. Add `TopKRuntimeFilter` to those scans
Then the trick will be to figure out how to share the `TopKHeap` created in
the TopK operator
https://github.com/apache/datafusion/blob/8c8b2454cbd78204dc6426f9898b79c179486a86/datafusion/physical-plan/src/topk/mod.rs#L259
With the `TopKRuntimeFilter`
And then orchestrate concurrent access to it somehow
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]