ZhangHuiGui commented on code in PR #41975:
URL: https://github.com/apache/arrow/pull/41975#discussion_r1637542445
##########
cpp/src/arrow/compute/exec.cc:
##########
@@ -498,15 +510,36 @@ struct NullGeneralization {
return PERHAPS_NULL;
}
+ static type Get(const ChunkedArray& chunk_array) {
+ if (chunk_array.num_chunks() == 0 ||
+ (chunk_array.null_count() == chunk_array.length())) {
+ return ALL_NULL;
+ }
+
+ type null_gen = ALL_NULL;
+ int chunk_idx = 0;
+ while (chunk_idx < chunk_array.num_chunks()) {
+ ExecValue value;
+ const ArrayData* curr_chunk = chunk_array.chunk(chunk_idx)->data().get();
+ value.SetArray(*curr_chunk);
+ null_gen = static_cast<type>((null_gen & Get(value)));
Review Comment:
> Personally I think LLVM/gcc is smart enough and this might not a
bottleneck...
There is a potential problem here. The previous calculation logic will rely
on ALL_NULL = 3, so the null_gen check for all chunks in a chunk-array can be
done with `&`.
For example: chunk-array returns `ALL_VALID` only if all chunks are
`ALL_VALID`. The calculation through `&` must rely on `ALL_NULL = 3`, which is
actually unreasonable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]