Re: [PR] fix: equivalence for union [datafusion]

via GitHub Tue, 27 May 2025 12:55:14 -0700


alamb commented on code in PR #16185:
URL: https://github.com/apache/datafusion/pull/16185#discussion_r2110078689



##########
datafusion/physical-expr/src/equivalence/class.rs:
##########
@@ -422,6 +423,60 @@ impl EquivalenceGroup {
         self.bridge_classes()
     }
 
+    /// Returns a set of all adjacent expression pairs in this equivalence 
group.
+    ///
+    /// For each equivalence class, this function creates pairs of expressions 
that are adjacent
+    /// to each other. For example, if we have an equivalence class [a, b, c], 
it will generate
+    /// pairs (a,b), (a,c), (b,c) and their reverse pairs (b,a), (c,a), (c,b).
+    ///
+    /// This is used by the `intersect` function to find common expression 
pairs between
+    /// two equivalence groups.
+    #[allow(clippy::type_complexity)]
+    fn adjacent_set(&self) -> HashSet<(&Arc<dyn PhysicalExpr>, &Arc<dyn 
PhysicalExpr>)> {
+        let mut set = HashSet::new();

Review Comment:
   this algorithm is `O(N^2)` in the number of items in the equivalence class, 
right? 
   
   We have hit other really long planning times when using such algorithms, for 
example
   - https://github.com/apache/datafusion/issues/13748
   
   Is there some way to make this more efficient? Maybe by creating an 2x 
boolean array to represent the intersections?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: equivalence for union [datafusion]

Reply via email to