[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #3912: [wip]: Expression boundary analysis framework

GitBox Fri, 21 Oct 2022 12:42:35 -0700


isidentical commented on code in PR #3912:
URL: https://github.com/apache/arrow-datafusion/pull/3912#discussion_r1002132576



##########
datafusion/physical-expr/src/expressions/binary.rs:
##########
@@ -771,32 +745,47 @@ fn compare_left_boundaries(
             // it is actually smaller than what we have right now) and it is a 
valid
             // value (e.g. [0, 100] < -100 would update the boundaries to [0, 
-100] if
             // there weren't the selectivity check).
-            if rhs_scalar_value < variadic_max && selectivity > 0.0 {
-                (variadic_min, rhs_scalar_value)
+            if right < left_max && selectivity > 0.0 {
+                (left_min, right)
             } else {
-                (variadic_min, variadic_max)
+                (left_min, left_max)
             }
         }
         Operator::Gt | Operator::GtEq => {
             // Same as above, but this time we want to limit the lower bound.
-            if rhs_scalar_value > variadic_min && selectivity > 0.0 {
-                (rhs_scalar_value, variadic_max)
+            if right > left_min && selectivity > 0.0 {
+                (right, left_max)
             } else {
-                (variadic_min, variadic_max)
+                (left_min, left_max)
             }
         }
         // For equality, we don't have the range problem so even if the 
selectivity
         // is 0.0, we can still update the boundaries.
-        Operator::Eq => (rhs_scalar_value.clone(), rhs_scalar_value),
+        Operator::Eq => (right.clone(), right),
         _ => unreachable!(),
     };
 
-    Some(ExprBoundaries {
-        min_value: new_min,
-        max_value: new_max,
-        distinct_count,
-        selectivity: Some(selectivity),
-    })
+    // The context represents all the knowledge we have gathered during the
+    // analysis process, which we can now add more since the expression's upper
+    // and lower boundaries might have changed.
+    let left_bounds = ExprBoundaries::new(left_min, left_max, 
left_bounds.distinct_count);
+    left.apply(context, &left_bounds);

Review Comment:
   > 🤔 don't the rules about combining information about the left and right 
side depend on the operators?
   
   Exactly! And that is certainly possible with the current implementation (all 
the control is delegated to expression's own `analyze()` implementation). This 
part is about a single comparison, but the same system (albeit a bit 
differently) can also be used with the conjunctions (like the following 
example, originally from #3898),
   
   > Each expression can either decide to share all the context with its 
children (e.g. in the case of AND, they'll all share a single context so all 
the following expressions would know that a for example can be only greater 
than 6 at this point etc.) or fork the context and then merge it together by 
using its logic (OR is the most basic example where you have to take the union 
of the ranges since it could be either of them).
   
   If we were to be implement `AND` and `OR`, then `AND` would pretty much pass 
the same context to throughout the conjunction. But for `OR`, when diverging to 
left and right, we would pass separate contexts (clone of the current version) 
and when processing for both of them is completed we can merge the boundaries 
of the columns in those contexts and update the existing context (basically a 
union of left and right, `(a < 5) OR (a > 10)` would say `bounds(a) = [min(a), 
5] U [10, max(a)]`). This allows us to robustly handle chains of ANDs/ORs 
without worrying about it from the call sites (all the responsibility is 
delegated to the parent expression, which knows how to handle its children 
better than anyone 🙂).
   
   If it sounds interesting, I can try to give it a shot at implementing AND/OR 
s in this PR (to have an actual use case). WDYT? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] isidentical commented on a diff in pull request #3912: [wip]: Expression boundary analysis framework

Reply via email to