alamb commented on code in PR #16185:
URL: https://github.com/apache/datafusion/pull/16185#discussion_r2110078689
##########
datafusion/physical-expr/src/equivalence/class.rs:
##########
@@ -422,6 +423,60 @@ impl EquivalenceGroup {
self.bridge_classes()
}
+ /// Returns a set of all adjacent expression pairs in this equivalence
group.
+ ///
+ /// For each equivalence class, this function creates pairs of expressions
that are adjacent
+ /// to each other. For example, if we have an equivalence class [a, b, c],
it will generate
+ /// pairs (a,b), (a,c), (b,c) and their reverse pairs (b,a), (c,a), (c,b).
+ ///
+ /// This is used by the `intersect` function to find common expression
pairs between
+ /// two equivalence groups.
+ #[allow(clippy::type_complexity)]
+ fn adjacent_set(&self) -> HashSet<(&Arc<dyn PhysicalExpr>, &Arc<dyn
PhysicalExpr>)> {
+ let mut set = HashSet::new();
Review Comment:
this algorithm is `O(N^2)` in the number of items in the equivalence class,
right?
We have hit other really long planning times when using such algorithms, for
example
- https://github.com/apache/datafusion/issues/13748
Is there some way to make this more efficient? Maybe by creating an 2x
boolean array to represent the intersections?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]