ZhangHuiGui commented on code in PR #41975:
URL: https://github.com/apache/arrow/pull/41975#discussion_r1640761803
##########
cpp/src/arrow/compute/exec.cc:
##########
@@ -1034,9 +1034,23 @@ class VectorExecutor : public
KernelExecutorImpl<VectorKernel> {
output_num_buffers_ =
static_cast<int>(output_type_.type->layout().buffers.size());
// Decide if we need to preallocate memory for this kernel
- validity_preallocated_ =
- (kernel_->null_handling != NullHandling::COMPUTED_NO_PREALLOCATE &&
- kernel_->null_handling != NullHandling::OUTPUT_NOT_NULL);
+ validity_preallocated_ = false;
+ if (output_type_.type->id() != Type::NA) {
+ if (kernel_->null_handling == NullHandling::COMPUTED_PREALLOCATE) {
+ // Override the flag if kernel asks for pre-allocation
+ validity_preallocated_ = true;
+ } else if (kernel_->null_handling == NullHandling::INTERSECTION) {
+ bool elide_validity_bitmap = true;
+ for (const auto& arg : batch.values) {
+ auto null_gen = NullGeneralization::Get(arg) ==
NullGeneralization::ALL_VALID;
+
+ // If not all valid, this becomes false
+ elide_validity_bitmap = elide_validity_bitmap && null_gen;
+ }
+ validity_preallocated_ = !elide_validity_bitmap;
+ }
+ }
Review Comment:
> For a kernel implementer, it's simpler to assume
`NullHandling::INTERSECTION` implies pre-allocated validity bitmap buffer. No
matter what arrays are passed as input.
I see, thank you for your explanation @felipecrv .
Do you think the changes in the third commit of this PR are reasonable? If
so, I will revert the first two commits. The third commit is mainly related to
the pre-allocation of output data-buffer and refactored part of the code, which
can bring the benefits mentioned by @mapleFU .
In addition, I will create an issue to track the pre-allocation of
validity-buffer in `INTERSECTION` mode, because currently `ScalarExecutor` will
do pre-allocation according to `NullGeneralization` in this mode.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]