mingmwang commented on code in PR #4043:
URL: https://github.com/apache/arrow-datafusion/pull/4043#discussion_r1012391732


##########
datafusion/physical-expr/src/utils.rs:
##########
@@ -65,6 +74,210 @@ pub fn sort_expr_list_eq_strict_order(
     list1.len() == list2.len() && list1.iter().zip(list2.iter()).all(|(e1, 
e2)| e1.eq(e2))
 }
 
+/// Assume the predicate is in the form of CNF, split the predicate to a Vec 
of PhysicalExprs.
+///
+/// For example, split "a1 = a2 AND b1 <= b2 AND c1 != c2" into ["a1 = a2", 
"b1 <= b2", "c1 != c2"]
+///
+pub fn split_predicate(predicate: &Arc<dyn PhysicalExpr>) -> Vec<&Arc<dyn 
PhysicalExpr>> {
+    match predicate.as_any().downcast_ref::<BinaryExpr>() {
+        Some(binary) => match binary.op() {
+            Operator::And => {
+                let mut vec1 = split_predicate(binary.left());
+                let vec2 = split_predicate(binary.right());
+                vec1.extend(vec2);
+                vec1
+            }
+            _ => vec![predicate],
+        },
+        None => vec![],
+    }
+}
+
+/// Combine the new equal condition with the existing equivalence properties.
+pub fn combine_equivalence_properties(
+    eq_properties: &mut Vec<EquivalenceProperties>,
+    new_condition: (&Column, &Column),
+) {
+    let mut idx1 = -1i32;
+    let mut idx2 = -1i32;

Review Comment:
   > Looking very impressive @mingmwang -- thank you very much
   > 
   > My biggest question is how are the changes to distribution tested? I see 
code that verifies partitioning (or rather not partitioning) with UnionExec but 
there are changes made to all the other physical operators.
   > 
   > For example what about tests for `WindowAggregate` and outer joins and 
sort merge join?
   > 
   > I saw tests for some of the functions for operating on 
`EquivalenceProperties` 👍 but not all of them.
   > 
   > I left some style questions about encapsulating `EquivalenceProperties` 
that might also help
   > 
   > So TLDR is I think the changes to the physical operators need more tests.
   > 
   > Maybe you could break out the equivalence class code into a separate PR?
   
   In the 3rd Part of the PR, I will have 7~8 UTs to verify the distribution of 
the physical plans. Maybe they are more like
   INTG tests, but the logic will be carefully verified. 



##########
datafusion/physical-expr/src/utils.rs:
##########
@@ -65,6 +74,210 @@ pub fn sort_expr_list_eq_strict_order(
     list1.len() == list2.len() && list1.iter().zip(list2.iter()).all(|(e1, 
e2)| e1.eq(e2))
 }
 
+/// Assume the predicate is in the form of CNF, split the predicate to a Vec 
of PhysicalExprs.
+///
+/// For example, split "a1 = a2 AND b1 <= b2 AND c1 != c2" into ["a1 = a2", 
"b1 <= b2", "c1 != c2"]
+///
+pub fn split_predicate(predicate: &Arc<dyn PhysicalExpr>) -> Vec<&Arc<dyn 
PhysicalExpr>> {
+    match predicate.as_any().downcast_ref::<BinaryExpr>() {
+        Some(binary) => match binary.op() {
+            Operator::And => {
+                let mut vec1 = split_predicate(binary.left());
+                let vec2 = split_predicate(binary.right());
+                vec1.extend(vec2);
+                vec1
+            }
+            _ => vec![predicate],
+        },
+        None => vec![],
+    }
+}
+
+/// Combine the new equal condition with the existing equivalence properties.
+pub fn combine_equivalence_properties(
+    eq_properties: &mut Vec<EquivalenceProperties>,
+    new_condition: (&Column, &Column),
+) {
+    let mut idx1 = -1i32;
+    let mut idx2 = -1i32;

Review Comment:
   > Looking very impressive @mingmwang -- thank you very much
   > 
   > My biggest question is how are the changes to distribution tested? I see 
code that verifies partitioning (or rather not partitioning) with UnionExec but 
there are changes made to all the other physical operators.
   > 
   > For example what about tests for `WindowAggregate` and outer joins and 
sort merge join?
   > 
   > I saw tests for some of the functions for operating on 
`EquivalenceProperties` 👍 but not all of them.
   > 
   > I left some style questions about encapsulating `EquivalenceProperties` 
that might also help
   > 
   > So TLDR is I think the changes to the physical operators need more tests.
   > 
   > Maybe you could break out the equivalence class code into a separate PR?
   
   In the 3rd Part of the PR, I will have 7~8 UTs to verify the distribution of 
the physical plans. Maybe they are more like INTG tests, but the logic will be 
carefully verified. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to