larry98 commented on code in PR #43256:
URL: https://github.com/apache/arrow/pull/43256#discussion_r1699040813


##########
cpp/src/arrow/compute/expression.cc:
##########
@@ -1242,8 +1273,92 @@ struct Inequality {
                             /*insert_implicit_casts=*/false, &exec_context);
   }
 
+  /// Simplify an `is_in` value set against an inequality guarantee.
+  ///
+  /// Simplifying an `is_in` predicate involves filtering out any values from
+  /// the value set that cannot possibly be found given the guarantee. For
+  /// example, if we have the predicate 'x is_in [1, 2, 3, 4]' and the 
guarantee
+  /// 'x > 2', then the simplified predicate 'x is_in [3, 4]' is equivalent.
+  /// This can be done efficiently if the value set is sorted and unique by
+  /// binary searching the inequality gound and slicing the value set
+  /// accordingly.
+  ///
+  /// \pre `guarantee` is non-nullable
+  /// \pre `guarantee.bound` is a scalar
+  /// \pre `guarantee.bound.type()->id() == value_set->type_id()`
+  /// \pre `value_set` is non-empty
+  /// \return a simplified value set, or a bool if the simplification results 
in
+  ///   a boolean literal predicate.
+  template <typename ArrowType>
+  static std::variant<std::shared_ptr<Array>, bool> SimplifyIsInValueSet(
+      const Inequality& guarantee, std::shared_ptr<Array> value_set) {
+    using ArrayType = typename TypeTraits<ArrowType>::ArrayType;
+    using ScalarType = typename TypeTraits<ArrowType>::ScalarType;
+    using CType = 
decltype(checked_pointer_cast<ArrayType>(value_set)->Value(0));

Review Comment:
   It fails for all string/binary like types.



##########
cpp/src/arrow/compute/expression.cc:
##########
@@ -1242,8 +1273,92 @@ struct Inequality {
                             /*insert_implicit_casts=*/false, &exec_context);
   }
 
+  /// Simplify an `is_in` value set against an inequality guarantee.
+  ///
+  /// Simplifying an `is_in` predicate involves filtering out any values from
+  /// the value set that cannot possibly be found given the guarantee. For
+  /// example, if we have the predicate 'x is_in [1, 2, 3, 4]' and the 
guarantee
+  /// 'x > 2', then the simplified predicate 'x is_in [3, 4]' is equivalent.
+  /// This can be done efficiently if the value set is sorted and unique by
+  /// binary searching the inequality gound and slicing the value set
+  /// accordingly.
+  ///
+  /// \pre `guarantee` is non-nullable
+  /// \pre `guarantee.bound` is a scalar
+  /// \pre `guarantee.bound.type()->id() == value_set->type_id()`
+  /// \pre `value_set` is non-empty
+  /// \return a simplified value set, or a bool if the simplification results 
in
+  ///   a boolean literal predicate.
+  template <typename ArrowType>
+  static std::variant<std::shared_ptr<Array>, bool> SimplifyIsInValueSet(

Review Comment:
   I don't follow the part about moving the `switch` into this function.
   
   I'm mainly unsure of whether we should handle the case where the guarantee 
type and value set type are not identical. Previously, the comparison compute 
function would have handled this for us by dispatching on the best kernel, but 
currently the code will just abort simplification if this is the case. Any 
thoughts on whether we should handle this here and the best way to go about 
this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to