This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new 4818966fa0 Apply pre-selection and computation skipping to
short-circuit optimization (#15694)
4818966fa0 is described below
commit 4818966fa0950a6a471d1ea9e4b8c5808878207e
Author: LB7666 <[email protected]>
AuthorDate: Fri Apr 18 01:06:13 2025 +0800
Apply pre-selection and computation skipping to short-circuit optimization
(#15694)
* Enhance short-circuit evaluation for binary expressions
- Delay evaluation of the right-hand side (RHS) unless necessary.
- Optimize short-circuiting for `Operator::And` and `Operator::Or` by
checking LHS alone first.
- Introduce `get_short_circuit_result` function to determine short-circuit
conditions based on LHS and RHS.
- Update tests to cover various short-circuit scenarios for both `AND` and
`OR` operations.
* refactor: rename test_check_short_circuit to
test_get_short_circuit_result and update assertions
- Renamed the test function for clarity.
- Updated assertions to use get_short_circuit_result instead of
check_short_circuit.
- Added additional test cases for AND and OR operations with expected
results.
* fix: enhance short-circuit evaluation logic in get_short_circuit_result
function for null
- Updated AND and OR short-circuit conditions to only trigger when all
values are either false or true, respectively, and there are no nulls in the
array.
- Adjusted test case to reflect the change in expected output.
* feat: add debug logging for binary expression evaluation and
short-circuit checks
* fix: improve short-circuit evaluation logic in BinaryExpr to ensure RHS
is only evaluated when necessary
* fix: restrict short-circuit evaluation to logical operators in
get_short_circuit_result function
* add more println!("==> ");
* fix: remove duplicate data type checks for left and right operands in
BinaryExpr evaluation
* feat: add debug prints for dictionary values and keys in binary
expression tests
* Tests pass
* fix: remove redundant short-circuit evaluation check in BinaryExpr and
enhance documentation for get_short_circuit_result
* refactor: remove unnecessary debug prints and streamline short-circuit
evaluation in BinaryExpr
* test: enhance short-circuit evaluation tests for nullable and scalar
values in BinaryExpr
* add benchmark
* refactor: improve short-circuit logic in BinaryExpr for logical operators
- Renamed `arg` to `lhs` for clarity in the `get_short_circuit_result`
function.
- Updated handling of Boolean data types to return `None` for null values.
- Simplified short-circuit checks for AND/OR operations by consolidating
logic.
- Enhanced readability and maintainability of the code by restructuring
match statements.
* refactor: enhance short-circuit evaluation strategy in BinaryExpr to
optimize logical operations
* Revert "refactor: enhance short-circuit evaluation strategy in BinaryExpr
to optimize logical operations"
This reverts commit a62df471e10ea4e53c7438c768ed273d1d344d22.
* bench: add benchmark for OR operation with all false values in
short-circuit evaluation
* refactor: add ShortCircuitStrategy enum to optimize short-circuit
evaluation in BinaryExpr
- Replaced the lazy evaluation of the right-hand side (RHS) with immediate
evaluation based on short-circuiting logic.
- Introduced a new function `check_short_circuit` to determine if
short-circuiting can be applied for logical operators.
- Updated the logic to return early for `Operator::And` and `Operator::Or`
based on the evaluation of the left-hand side (LHS) and the conditions of the
RHS.
- Improved clarity and efficiency of the short-circuit evaluation process
by eliminating unnecessary evaluations.
* refactor: simplify short-circuit evaluation logic in check_short_circuit
function
* datafusion_expr::lit as expr_lit
* refactor: optimize short-circuit evaluation in check_short_circuit
function
- Simplified logic for AND/OR operations by prioritizing false/true counts
to enhance performance.
- Updated documentation to reflect changes in array handling techniques.
* refactor: add count_boolean_values helper function and optimize
check_short_circuit logic
- Introduced a new helper function `count_boolean_values` to count true and
false values in a BooleanArray, improving readability and performance.
- Updated `check_short_circuit` to utilize the new helper function for
counting, reducing redundant operations and enhancing clarity in the evaluation
logic for AND/OR operations.
- Adjusted comments for better understanding of the short-circuiting
conditions based on the new counting mechanism.
* Revert "refactor: add count_boolean_values helper function and optimize
check_short_circuit logic"
This reverts commit e2b9f777222759a530bd539ab0453c100b2699b4.
* optimise evaluate
* optimise evaluate 2
* refactor op:AND, lhs all false op:OR, lhs all true to be faster
* fix clippy warning
* refactor: optimize short-circuit evaluation logic in check_short_circuit
function
* fix clippy warning
* add pre selection
* add some comments
* [WIP] fix pre-selection result
* fix: Error in calculating the ratio
* fix: Correct typo in pre-selection threshold constant and improve
pre-selection scatter function documentation
* fix doctest error
* fix cargo doc
* fix cargo doc
* test: Add unit tests for pre_selection_scatter function
---------
Co-authored-by: Siew Kam Onn <[email protected]>
---
datafusion/physical-expr/benches/binary_op.rs | 17 +-
datafusion/physical-expr/src/expressions/binary.rs | 509 ++++++++++++++++++---
2 files changed, 458 insertions(+), 68 deletions(-)
diff --git a/datafusion/physical-expr/benches/binary_op.rs
b/datafusion/physical-expr/benches/binary_op.rs
index 59a602df05..216d8a520e 100644
--- a/datafusion/physical-expr/benches/binary_op.rs
+++ b/datafusion/physical-expr/benches/binary_op.rs
@@ -126,14 +126,25 @@ fn generate_boolean_cases<const TEST_ALL_FALSE: bool>(
));
}
+ // Scenario 7: Test all true or all false in AND/OR
+ // This situation won't cause a short circuit, but it can skip the bool
calculation
+ if TEST_ALL_FALSE {
+ let all_true = vec![true; len];
+ cases.push(("all_true_in_and".to_string(),
BooleanArray::from(all_true)));
+ } else {
+ let all_false = vec![false; len];
+ cases.push(("all_false_in_or".to_string(),
BooleanArray::from(all_false)));
+ }
+
cases
}
/// Benchmarks AND/OR operator short-circuiting by evaluating complex regex
conditions.
///
-/// Creates 6 test scenarios per operator:
+/// Creates 7 test scenarios per operator:
/// 1. All values enable short-circuit (all_true/all_false)
/// 2. 2-6 Single true/false value at different positions to measure early exit
+/// 3. Test all true or all false in AND/OR
///
/// You can run this benchmark with:
/// ```sh
@@ -203,7 +214,7 @@ fn benchmark_binary_op_in_short_circuit(c: &mut Criterion) {
// Each scenario when the test operator is `and`
{
- for (name, batch) in batches_and {
+ for (name, batch) in batches_and.into_iter() {
c.bench_function(&format!("short_circuit/and/{}", name), |b| {
b.iter(|| expr_and.evaluate(black_box(&batch)).unwrap())
});
@@ -211,7 +222,7 @@ fn benchmark_binary_op_in_short_circuit(c: &mut Criterion) {
}
// Each scenario when the test operator is `or`
{
- for (name, batch) in batches_or {
+ for (name, batch) in batches_or.into_iter() {
c.bench_function(&format!("short_circuit/or/{}", name), |b| {
b.iter(|| expr_or.evaluate(black_box(&batch)).unwrap())
});
diff --git a/datafusion/physical-expr/src/expressions/binary.rs
b/datafusion/physical-expr/src/expressions/binary.rs
index 84374f4a29..6c68d11e2c 100644
--- a/datafusion/physical-expr/src/expressions/binary.rs
+++ b/datafusion/physical-expr/src/expressions/binary.rs
@@ -29,7 +29,9 @@ use arrow::compute::kernels::boolean::{and_kleene, not,
or_kleene};
use arrow::compute::kernels::cmp::*;
use arrow::compute::kernels::comparison::{regexp_is_match,
regexp_is_match_scalar};
use arrow::compute::kernels::concat_elements::concat_elements_utf8;
-use arrow::compute::{cast, ilike, like, nilike, nlike};
+use arrow::compute::{
+ cast, filter_record_batch, ilike, like, nilike, nlike, SlicesIterator,
+};
use arrow::datatypes::*;
use arrow::error::ArrowError;
use datafusion_common::cast::as_boolean_array;
@@ -358,11 +360,24 @@ impl PhysicalExpr for BinaryExpr {
fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
use arrow::compute::kernels::numeric::*;
+ // Evaluate left-hand side expression.
let lhs = self.left.evaluate(batch)?;
- // Optimize for short-circuiting `Operator::And` or `Operator::Or`
operations and return early.
- if check_short_circuit(&lhs, &self.op) {
- return Ok(lhs);
+ // Check if we can apply short-circuit evaluation.
+ match check_short_circuit(&lhs, &self.op) {
+ ShortCircuitStrategy::None => {}
+ ShortCircuitStrategy::ReturnLeft => return Ok(lhs),
+ ShortCircuitStrategy::ReturnRight => {
+ let rhs = self.right.evaluate(batch)?;
+ return Ok(rhs);
+ }
+ ShortCircuitStrategy::PreSelection(selection) => {
+ // The function `evaluate_selection` was not called for
filtering and calculation,
+ // as it takes into account cases where the selection contains
null values.
+ let batch = filter_record_batch(batch, selection)?;
+ let right_ret = self.right.evaluate(&batch)?;
+ return pre_selection_scatter(selection, right_ret);
+ }
}
let rhs = self.right.evaluate(batch)?;
@@ -405,23 +420,19 @@ impl PhysicalExpr for BinaryExpr {
let result_type = self.data_type(input_schema)?;
- // Attempt to use special kernels if one input is scalar and the other
is an array
- let scalar_result = match (&lhs, &rhs) {
- (ColumnarValue::Array(array), ColumnarValue::Scalar(scalar)) => {
- // if left is array and right is literal(not NULL) - use
scalar operations
- if scalar.is_null() {
- None
- } else {
- self.evaluate_array_scalar(array, scalar.clone())?.map(|r|
{
- r.and_then(|a| to_result_type_array(&self.op, a,
&result_type))
- })
+ // If the left-hand side is an array and the right-hand side is a
non-null scalar, try the optimized kernel.
+ if let (ColumnarValue::Array(array), ColumnarValue::Scalar(ref
scalar)) =
+ (&lhs, &rhs)
+ {
+ if !scalar.is_null() {
+ if let Some(result_array) =
+ self.evaluate_array_scalar(array, scalar.clone())?
+ {
+ let final_array = result_array
+ .and_then(|a| to_result_type_array(&self.op, a,
&result_type));
+ return final_array.map(ColumnarValue::Array);
}
}
- (_, _) => None, // default to array implementation
- };
-
- if let Some(result) = scalar_result {
- return result.map(ColumnarValue::Array);
}
// if both arrays or both literals - extract arrays and continue
execution
@@ -811,58 +822,199 @@ impl BinaryExpr {
}
}
+enum ShortCircuitStrategy<'a> {
+ None,
+ ReturnLeft,
+ ReturnRight,
+ PreSelection(&'a BooleanArray),
+}
+
+/// Based on the results calculated from the left side of the short-circuit
operation,
+/// if the proportion of `true` is less than 0.2 and the current operation is
an `and`,
+/// the `RecordBatch` will be filtered in advance.
+const PRE_SELECTION_THRESHOLD: f32 = 0.2;
+
/// Checks if a logical operator (`AND`/`OR`) can short-circuit evaluation
based on the left-hand side (lhs) result.
///
-/// Short-circuiting occurs when evaluating the right-hand side (rhs) becomes
unnecessary:
-/// - For `AND`: if ALL values in `lhs` are `false`, the expression must be
`false` regardless of rhs.
-/// - For `OR`: if ALL values in `lhs` are `true`, the expression must be
`true` regardless of rhs.
-///
-/// Returns `true` if short-circuiting is possible, `false` otherwise.
-///
+/// Short-circuiting occurs under these circumstances:
+/// - For `AND`:
+/// - if LHS is all false => short-circuit → return LHS
+/// - if LHS is all true => short-circuit → return RHS
+/// - if LHS is mixed and true_count/sum_count <=
[`PRE_SELECTION_THRESHOLD`] -> pre-selection
+/// - For `OR`:
+/// - if LHS is all true => short-circuit → return LHS
+/// - if LHS is all false => short-circuit → return RHS
/// # Arguments
-/// * `arg` - The left-hand side (lhs) columnar value (array or scalar)
+/// * `lhs` - The left-hand side (lhs) columnar value (array or scalar)
+/// * `lhs` - The left-hand side (lhs) columnar value (array or scalar)
/// * `op` - The logical operator (`AND` or `OR`)
///
/// # Implementation Notes
/// 1. Only works with Boolean-typed arguments (other types automatically
return `false`)
/// 2. Handles both scalar values and array values
-/// 3. For arrays, uses optimized `true_count()`/`false_count()` methods from
arrow-rs.
-/// `bool_or`/`bool_and` maybe a better choice too,for detailed
discussion,see:[link](https://github.com/apache/datafusion/pull/15462#discussion_r2020558418)
-fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool {
- let data_type = arg.data_type();
- match (data_type, op) {
- (DataType::Boolean, Operator::And) => {
- match arg {
- ColumnarValue::Array(array) => {
- if let Ok(array) = as_boolean_array(&array) {
- return array.false_count() == array.len();
- }
+/// 3. For arrays, uses optimized bit counting techniques for boolean arrays
+fn check_short_circuit<'a>(
+ lhs: &'a ColumnarValue,
+ op: &Operator,
+) -> ShortCircuitStrategy<'a> {
+ // Quick reject for non-logical operators,and quick judgment when op is and
+ let is_and = match op {
+ Operator::And => true,
+ Operator::Or => false,
+ _ => return ShortCircuitStrategy::None,
+ };
+
+ // Non-boolean types can't be short-circuited
+ if lhs.data_type() != DataType::Boolean {
+ return ShortCircuitStrategy::None;
+ }
+
+ match lhs {
+ ColumnarValue::Array(array) => {
+ // Fast path for arrays - try to downcast to boolean array
+ if let Ok(bool_array) = as_boolean_array(array) {
+ // Arrays with nulls can't be short-circuited
+ if bool_array.null_count() > 0 {
+ return ShortCircuitStrategy::None;
}
- ColumnarValue::Scalar(scalar) => {
- if let ScalarValue::Boolean(Some(value)) = scalar {
- return !value;
+
+ let len = bool_array.len();
+ if len == 0 {
+ return ShortCircuitStrategy::None;
+ }
+
+ let true_count = bool_array.values().count_set_bits();
+ if is_and {
+ // For AND, prioritize checking for all-false (short
circuit case)
+ // Uses optimized false_count() method provided by Arrow
+
+ // Short circuit if all values are false
+ if true_count == 0 {
+ return ShortCircuitStrategy::ReturnLeft;
+ }
+
+ // If no false values, then all must be true
+ if true_count == len {
+ return ShortCircuitStrategy::ReturnRight;
+ }
+
+ // determine if we can pre-selection
+ if true_count as f32 / len as f32 <=
PRE_SELECTION_THRESHOLD {
+ return ShortCircuitStrategy::PreSelection(bool_array);
+ }
+ } else {
+ // For OR, prioritize checking for all-true (short circuit
case)
+ // Uses optimized true_count() method provided by Arrow
+
+ // Short circuit if all values are true
+ if true_count == len {
+ return ShortCircuitStrategy::ReturnLeft;
+ }
+
+ // If no true values, then all must be false
+ if true_count == 0 {
+ return ShortCircuitStrategy::ReturnRight;
}
}
}
- false
}
- (DataType::Boolean, Operator::Or) => {
- match arg {
- ColumnarValue::Array(array) => {
- if let Ok(array) = as_boolean_array(&array) {
- return array.true_count() == array.len();
- }
- }
- ColumnarValue::Scalar(scalar) => {
- if let ScalarValue::Boolean(Some(value)) = scalar {
- return *value;
- }
+ ColumnarValue::Scalar(scalar) => {
+ // Fast path for scalar values
+ if let ScalarValue::Boolean(Some(is_true)) = scalar {
+ // Return Left for:
+ // - AND with false value
+ // - OR with true value
+ if (is_and && !is_true) || (!is_and && *is_true) {
+ return ShortCircuitStrategy::ReturnLeft;
+ } else {
+ return ShortCircuitStrategy::ReturnRight;
}
}
- false
}
- _ => false,
}
+
+ // If we can't short-circuit, indicate that normal evaluation should
continue
+ ShortCircuitStrategy::None
+}
+
+/// Creates a new boolean array based on the evaluation of the right
expression,
+/// but only for positions where the left_result is true.
+///
+/// This function is used for short-circuit evaluation optimization of logical
AND operations:
+/// - When left_result has few true values, we only evaluate the right
expression for those positions
+/// - Values are copied from right_array where left_result is true
+/// - All other positions are filled with false values
+///
+/// # Parameters
+/// - `left_result` Boolean array with selection mask (typically from left
side of AND)
+/// - `right_result` Result of evaluating right side of expression (only for
selected positions)
+///
+/// # Returns
+/// A combined ColumnarValue with values from right_result where left_result
is true
+///
+/// # Example
+/// Initial Data: { 1, 2, 3, 4, 5 }
+/// Left Evaluation
+/// (Condition: Equal to 2 or 3)
+/// ↓
+/// Filtered Data: {2, 3}
+/// Left Bitmap: { 0, 1, 1, 0, 0 }
+/// ↓
+/// Right Evaluation
+/// (Condition: Even numbers)
+/// ↓
+/// Right Data: { 2 }
+/// Right Bitmap: { 1, 0 }
+/// ↓
+/// Combine Results
+/// Final Bitmap: { 0, 1, 0, 0, 0 }
+///
+/// # Note
+/// Perhaps it would be better to modify `left_result` directly without
creating a copy?
+/// In practice, `left_result` should have only one owner, so making changes
should be safe.
+/// However, this is difficult to achieve under the immutable constraints of
[`Arc`] and [`BooleanArray`].
+fn pre_selection_scatter(
+ left_result: &BooleanArray,
+ right_result: ColumnarValue,
+) -> Result<ColumnarValue> {
+ let right_boolean_array = match &right_result {
+ ColumnarValue::Array(array) => array.as_boolean(),
+ ColumnarValue::Scalar(_) => return Ok(right_result),
+ };
+
+ let result_len = left_result.len();
+
+ let mut result_array_builder = BooleanArray::builder(result_len);
+
+ // keep track of current position we have in right boolean array
+ let mut right_array_pos = 0;
+
+ // keep track of how much is filled
+ let mut last_end = 0;
+ SlicesIterator::new(left_result).for_each(|(start, end)| {
+ // the gap needs to be filled with false
+ if start > last_end {
+ result_array_builder.append_n(start - last_end, false);
+ }
+
+ // copy values from right array for this slice
+ let len = end - start;
+ right_boolean_array
+ .slice(right_array_pos, len)
+ .iter()
+ .for_each(|v| result_array_builder.append_option(v));
+
+ right_array_pos += len;
+ last_end = end;
+ });
+
+ // Fill any remaining positions with false
+ if last_end < result_len {
+ result_array_builder.append_n(result_len - last_end, false);
+ }
+ let boolean_result = result_array_builder.finish();
+
+ Ok(ColumnarValue::Array(Arc::new(boolean_result)))
}
fn concat_elements(left: Arc<dyn Array>, right: Arc<dyn Array>) ->
Result<ArrayRef> {
@@ -919,10 +1071,14 @@ pub fn similar_to(
mod tests {
use super::*;
use crate::expressions::{col, lit, try_cast, Column, Literal};
+ use datafusion_expr::lit as expr_lit;
use datafusion_common::plan_datafusion_err;
use datafusion_physical_expr_common::physical_expr::fmt_sql;
+ use crate::planner::logical2physical;
+ use arrow::array::BooleanArray;
+ use datafusion_expr::col as logical_col;
/// Performs a binary operation, applying any type coercion necessary
fn binary_op(
left: Arc<dyn PhysicalExpr>,
@@ -4895,9 +5051,7 @@ mod tests {
#[test]
fn test_check_short_circuit() {
- use crate::planner::logical2physical;
- use datafusion_expr::col as logical_col;
- use datafusion_expr::lit;
+ // Test with non-nullable arrays
let schema = Arc::new(Schema::new(vec![
Field::new("a", DataType::Int32, false),
Field::new("b", DataType::Int32, false),
@@ -4911,20 +5065,245 @@ mod tests {
.unwrap();
// op: AND left: all false
- let left_expr = logical2physical(&logical_col("a").eq(lit(2)),
&schema);
+ let left_expr = logical2physical(&logical_col("a").eq(expr_lit(2)),
&schema);
let left_value = left_expr.evaluate(&batch).unwrap();
- assert!(check_short_circuit(&left_value, &Operator::And));
+ assert!(matches!(
+ check_short_circuit(&left_value, &Operator::And),
+ ShortCircuitStrategy::ReturnLeft
+ ));
+
// op: AND left: not all false
- let left_expr = logical2physical(&logical_col("a").eq(lit(3)),
&schema);
+ let left_expr = logical2physical(&logical_col("a").eq(expr_lit(3)),
&schema);
let left_value = left_expr.evaluate(&batch).unwrap();
- assert!(!check_short_circuit(&left_value, &Operator::And));
+ let ColumnarValue::Array(array) = &left_value else {
+ panic!("Expected ColumnarValue::Array");
+ };
+ let ShortCircuitStrategy::PreSelection(value) =
+ check_short_circuit(&left_value, &Operator::And)
+ else {
+ panic!("Expected ShortCircuitStrategy::PreSelection");
+ };
+ let expected_boolean_arr: Vec<_> =
+ as_boolean_array(array).unwrap().iter().collect();
+ let boolean_arr: Vec<_> = value.iter().collect();
+ assert_eq!(expected_boolean_arr, boolean_arr);
+
// op: OR left: all true
- let left_expr = logical2physical(&logical_col("a").gt(lit(0)),
&schema);
+ let left_expr = logical2physical(&logical_col("a").gt(expr_lit(0)),
&schema);
let left_value = left_expr.evaluate(&batch).unwrap();
- assert!(check_short_circuit(&left_value, &Operator::Or));
+ assert!(matches!(
+ check_short_circuit(&left_value, &Operator::Or),
+ ShortCircuitStrategy::ReturnLeft
+ ));
+
// op: OR left: not all true
- let left_expr = logical2physical(&logical_col("a").gt(lit(2)),
&schema);
+ let left_expr: Arc<dyn PhysicalExpr> =
+ logical2physical(&logical_col("a").gt(expr_lit(2)), &schema);
let left_value = left_expr.evaluate(&batch).unwrap();
- assert!(!check_short_circuit(&left_value, &Operator::Or));
+ assert!(matches!(
+ check_short_circuit(&left_value, &Operator::Or),
+ ShortCircuitStrategy::None
+ ));
+
+ // Test with nullable arrays and null values
+ let schema_nullable = Arc::new(Schema::new(vec![
+ Field::new("c", DataType::Boolean, true),
+ Field::new("d", DataType::Boolean, true),
+ ]));
+
+ // Create arrays with null values
+ let c_array = Arc::new(BooleanArray::from(vec![
+ Some(true),
+ Some(false),
+ None,
+ Some(true),
+ None,
+ ])) as ArrayRef;
+ let d_array = Arc::new(BooleanArray::from(vec![
+ Some(false),
+ Some(true),
+ Some(false),
+ None,
+ Some(true),
+ ])) as ArrayRef;
+
+ let batch_nullable = RecordBatch::try_new(
+ Arc::clone(&schema_nullable),
+ vec![Arc::clone(&c_array), Arc::clone(&d_array)],
+ )
+ .unwrap();
+
+ // Case: Mixed values with nulls - shouldn't short-circuit for AND
+ let mixed_nulls = logical2physical(&logical_col("c"),
&schema_nullable);
+ let mixed_nulls_value = mixed_nulls.evaluate(&batch_nullable).unwrap();
+ assert!(matches!(
+ check_short_circuit(&mixed_nulls_value, &Operator::And),
+ ShortCircuitStrategy::None
+ ));
+
+ // Case: Mixed values with nulls - shouldn't short-circuit for OR
+ assert!(matches!(
+ check_short_circuit(&mixed_nulls_value, &Operator::Or),
+ ShortCircuitStrategy::None
+ ));
+
+ // Test with all nulls
+ let all_nulls = Arc::new(BooleanArray::from(vec![None, None, None]))
as ArrayRef;
+ let null_batch = RecordBatch::try_new(
+ Arc::new(Schema::new(vec![Field::new("e", DataType::Boolean,
true)])),
+ vec![all_nulls],
+ )
+ .unwrap();
+
+ let null_expr = logical2physical(&logical_col("e"),
&null_batch.schema());
+ let null_value = null_expr.evaluate(&null_batch).unwrap();
+
+ // All nulls shouldn't short-circuit for AND or OR
+ assert!(matches!(
+ check_short_circuit(&null_value, &Operator::And),
+ ShortCircuitStrategy::None
+ ));
+ assert!(matches!(
+ check_short_circuit(&null_value, &Operator::Or),
+ ShortCircuitStrategy::None
+ ));
+
+ // Test with scalar values
+ // Scalar true
+ let scalar_true =
ColumnarValue::Scalar(ScalarValue::Boolean(Some(true)));
+ assert!(matches!(
+ check_short_circuit(&scalar_true, &Operator::Or),
+ ShortCircuitStrategy::ReturnLeft
+ )); // Should short-circuit OR
+ assert!(matches!(
+ check_short_circuit(&scalar_true, &Operator::And),
+ ShortCircuitStrategy::ReturnRight
+ )); // Should return the RHS for AND
+
+ // Scalar false
+ let scalar_false =
ColumnarValue::Scalar(ScalarValue::Boolean(Some(false)));
+ assert!(matches!(
+ check_short_circuit(&scalar_false, &Operator::And),
+ ShortCircuitStrategy::ReturnLeft
+ )); // Should short-circuit AND
+ assert!(matches!(
+ check_short_circuit(&scalar_false, &Operator::Or),
+ ShortCircuitStrategy::ReturnRight
+ )); // Should return the RHS for OR
+
+ // Scalar null
+ let scalar_null = ColumnarValue::Scalar(ScalarValue::Boolean(None));
+ assert!(matches!(
+ check_short_circuit(&scalar_null, &Operator::And),
+ ShortCircuitStrategy::None
+ ));
+ assert!(matches!(
+ check_short_circuit(&scalar_null, &Operator::Or),
+ ShortCircuitStrategy::None
+ ));
+ }
+
+ /// Test for [pre_selection_scatter]
+ /// Since [check_short_circuit] ensures that the left side does not
contain null and is neither all_true nor all_false, as well as not being empty,
+ /// the following tests have been designed:
+ /// 1. Test sparse left with interleaved true/false
+ /// 2. Test multiple consecutive true blocks
+ /// 3. Test multiple consecutive true blocks
+ /// 4. Test single true at first position
+ /// 5. Test single true at last position
+ /// 6. Test nulls in right array
+ /// 7. Test scalar right handling
+ #[test]
+ fn test_pre_selection_scatter() {
+ fn create_bool_array(bools: Vec<bool>) -> BooleanArray {
+ BooleanArray::from(bools.into_iter().map(Some).collect::<Vec<_>>())
+ }
+ // Test sparse left with interleaved true/false
+ {
+ // Left: [T, F, T, F, T]
+ // Right: [F, T, F] (values for 3 true positions)
+ let left = create_bool_array(vec![true, false, true, false, true]);
+ let right = ColumnarValue::Array(Arc::new(create_bool_array(vec![
+ false, true, false,
+ ])));
+
+ let result = pre_selection_scatter(&left, right).unwrap();
+ let result_arr = result.into_array(left.len()).unwrap();
+
+ let expected = create_bool_array(vec![false, false, true, false,
false]);
+ assert_eq!(&expected, result_arr.as_boolean());
+ }
+ // Test multiple consecutive true blocks
+ {
+ // Left: [F, T, T, F, T, T, T]
+ // Right: [T, F, F, T, F]
+ let left =
+ create_bool_array(vec![false, true, true, false, true, true,
true]);
+ let right = ColumnarValue::Array(Arc::new(create_bool_array(vec![
+ true, false, false, true, false,
+ ])));
+
+ let result = pre_selection_scatter(&left, right).unwrap();
+ let result_arr = result.into_array(left.len()).unwrap();
+
+ let expected =
+ create_bool_array(vec![false, true, false, false, false, true,
false]);
+ assert_eq!(&expected, result_arr.as_boolean());
+ }
+ // Test single true at first position
+ {
+ // Left: [T, F, F]
+ // Right: [F]
+ let left = create_bool_array(vec![true, false, false]);
+ let right =
ColumnarValue::Array(Arc::new(create_bool_array(vec![false])));
+
+ let result = pre_selection_scatter(&left, right).unwrap();
+ let result_arr = result.into_array(left.len()).unwrap();
+
+ let expected = create_bool_array(vec![false, false, false]);
+ assert_eq!(&expected, result_arr.as_boolean());
+ }
+ // Test single true at last position
+ {
+ // Left: [F, F, T]
+ // Right: [F]
+ let left = create_bool_array(vec![false, false, true]);
+ let right =
ColumnarValue::Array(Arc::new(create_bool_array(vec![false])));
+
+ let result = pre_selection_scatter(&left, right).unwrap();
+ let result_arr = result.into_array(left.len()).unwrap();
+
+ let expected = create_bool_array(vec![false, false, false]);
+ assert_eq!(&expected, result_arr.as_boolean());
+ }
+ // Test nulls in right array
+ {
+ // Left: [F, T, F, T]
+ // Right: [None, Some(false)] (with null at first position)
+ let left = create_bool_array(vec![false, true, false, true]);
+ let right_arr = BooleanArray::from(vec![None, Some(false)]);
+ let right = ColumnarValue::Array(Arc::new(right_arr));
+
+ let result = pre_selection_scatter(&left, right).unwrap();
+ let result_arr = result.into_array(left.len()).unwrap();
+
+ let expected = BooleanArray::from(vec![
+ Some(false),
+ None, // null from right
+ Some(false),
+ Some(false),
+ ]);
+ assert_eq!(&expected, result_arr.as_boolean());
+ }
+ // Test scalar right handling
+ {
+ // Left: [T, F, T]
+ // Right: Scalar true
+ let left = create_bool_array(vec![true, false, true]);
+ let right =
ColumnarValue::Scalar(ScalarValue::Boolean(Some(true)));
+
+ let result = pre_selection_scatter(&left, right).unwrap();
+ assert!(matches!(result, ColumnarValue::Scalar(_)));
+ }
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]