adriangb commented on code in PR #18449:
URL: https://github.com/apache/datafusion/pull/18449#discussion_r2505073073
##########
datafusion/physical-expr/src/utils/guarantee.rs:
##########
@@ -93,12 +93,12 @@ impl LiteralGuarantee {
/// Create a new instance of the guarantee if the provided operator is
/// supported. Returns None otherwise. See [`LiteralGuarantee::analyze`] to
/// create these structures from an predicate (boolean expression).
- fn new<'a>(
+ fn new(
Review Comment:
I think it's worth discussing in this review how far we propagate the
changes.
In particular, `InListExpr` will now have two operations modes:
1. Was built with an `ArrayRef` or was able to build an `ArrayRef` from a
homogeneously typed `Vec<Arc<dyn PhysicalExpr>>` which are all literals.
2. Was built with a `Vec<Arc<dyn PhysicalExpr>>` which are not literals or
homogeneously typed.
If we restrict `LiteralGuarantee` to only operate on the first cases, I
think we could lift out a lot of computation: instead of transforming
`ArrayRef` -> `Vec<Arc<dyn PhysicalExpr>>` -> `Vec<ScalarValue>` ->
`HashSet<ScalarValue>` which then gets fed into bloom filters which are
per-column and don't really support heterogenous `ScalarValue`s we could re-use
the already deduplicated `ArraySet` that `InListExpr` has internally or
something. The ultimate thing to do, but that would require even more work and
changes, would be to make `PruningPredicate::contains` accept an `enum
ArrayOrScalars { Array(ArrayRef), Scalars(Vec<ScalarValue>) }` so that we can
push down and iterate over the Arc'ed `ArrayRef` the whole way down. I think we
could make this backwards compatible.
I think that change is worth it, but it requires a bit more coordination
(with `arrow-rs`) and a bigger change.
The end result would be that:
1. When you create an `InListExpr` operates in mode (1) we are able to push
down into bloom filters with no data copies at all.
2. When the `InListExpr` operates in mode (2) we'd bail on the pushdown
early (e.g. `list() -> Option<ArrayRef>`) and avoid building the
`HashSet<ScalarValue>`, etc. that won't be used.
Wdyt @alamb ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]