felipecrv commented on code in PR #43256:
URL: https://github.com/apache/arrow/pull/43256#discussion_r1700408684
##########
cpp/src/arrow/compute/expression.cc:
##########
@@ -1242,8 +1273,92 @@ struct Inequality {
/*insert_implicit_casts=*/false, &exec_context);
}
+ /// Simplify an `is_in` value set against an inequality guarantee.
+ ///
+ /// Simplifying an `is_in` predicate involves filtering out any values from
+ /// the value set that cannot possibly be found given the guarantee. For
+ /// example, if we have the predicate 'x is_in [1, 2, 3, 4]' and the
guarantee
+ /// 'x > 2', then the simplified predicate 'x is_in [3, 4]' is equivalent.
+ /// This can be done efficiently if the value set is sorted and unique by
+ /// binary searching the inequality gound and slicing the value set
+ /// accordingly.
+ ///
+ /// \pre `guarantee` is non-nullable
+ /// \pre `guarantee.bound` is a scalar
+ /// \pre `guarantee.bound.type()->id() == value_set->type_id()`
+ /// \pre `value_set` is non-empty
+ /// \return a simplified value set, or a bool if the simplification results
in
+ /// a boolean literal predicate.
+ template <typename ArrowType>
+ static std::variant<std::shared_ptr<Array>, bool> SimplifyIsInValueSet(
Review Comment:
You can return a value that indicates the simplification should be aborted.
This is a common difficulty in kernel development: specializing by type
(using templates) only the kernel loops (in this case, the loop is the
binary-search) so that binary size doesn't explode.
<!-- Other libraries don't care so much about binary size, but Arrow's
success as a basic dependency on so many libraries comes from not being a
multi-GB library like many template-heave compute kernel libraries can be. -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]