ZhangHuiGui commented on code in PR #41975:
URL: https://github.com/apache/arrow/pull/41975#discussion_r1639153037


##########
cpp/src/arrow/compute/exec.cc:
##########
@@ -1034,9 +1034,23 @@ class VectorExecutor : public 
KernelExecutorImpl<VectorKernel> {
     output_num_buffers_ = 
static_cast<int>(output_type_.type->layout().buffers.size());
 
     // Decide if we need to preallocate memory for this kernel
-    validity_preallocated_ =
-        (kernel_->null_handling != NullHandling::COMPUTED_NO_PREALLOCATE &&
-         kernel_->null_handling != NullHandling::OUTPUT_NOT_NULL);
+    validity_preallocated_ = false;
+    if (output_type_.type->id() != Type::NA) {
+      if (kernel_->null_handling == NullHandling::COMPUTED_PREALLOCATE) {
+        // Override the flag if kernel asks for pre-allocation
+        validity_preallocated_ = true;
+      } else if (kernel_->null_handling == NullHandling::INTERSECTION) {
+        bool elide_validity_bitmap = true;
+        for (const auto& arg : batch.values) {
+          auto null_gen = NullGeneralization::Get(arg) == 
NullGeneralization::ALL_VALID;
+
+          // If not all valid, this becomes false
+          elide_validity_bitmap = elide_validity_bitmap && null_gen;
+        }
+        validity_preallocated_ = !elide_validity_bitmap;
+      }
+    }

Review Comment:
   > follow a single code path that uses the pre-allocated buffer to put the 
result of the bitmaps intersection
   
   Are you talking about `PropagateNulls`? Currently, seems this logic is used 
in `INTERSECTION`.
   
   If not, i understand that if the kernel internal execute path must `follow a 
single code path that uses the pre-allocated buffer`, `COMPUTED_PREALLOCATE` 
can be set statically, which should be safer for this kernel?
   
   Back to preallocation-validity buffer, if kernel-function sets 
`NullHandling::INTERSECTION`, does it mean that kernel-function may not be sure 
whether it needs a pre-allocated validity buffer and let the Executor decide 
for itself? At this time, the Executor can only dynamically determine whether 
to set the pre-allocate buffer based on the input data.
   
   > And even if that is not the case (assuming all kernels are perfect at the 
moment), this extra logic introduces branching in the possible states of 
pre-allocated buffers based on runtime input data and not something statically 
defined in the `kernel_` when it's configured.
   
   From a performance perspective, the current logic does introduce many 
branches at runtime for `INTERSECTION`, compared to `PropagateNulls`, it 
actually passes `NullGeneralization::Get` introduces very few branches.
   
   > The generic binding and execution logic should be simple and not depend on 
values so much. It's too complicated as it is.
   
   Yes, the current pre-allocation logic for `INTERSECTION` introduced 
`NullPropagator` and `NullGeneralization` is complicated.
   
   @pitrou What do you think of this PR for VectorExecutor's `INTERSECTION` 
validity-bitmap preallocation?
   It seems that you have changed the logic here before.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to