zanmato1984 commented on code in PR #48166:
URL: https://github.com/apache/arrow/pull/48166#discussion_r2557839405


##########
cpp/src/arrow/acero/hash_join.cc:
##########
@@ -306,12 +307,33 @@ class HashJoinBasicImpl : public HashJoinImpl {
 
     size_t num_probed_rows = match.size() + no_match.size();
     if (mask.is_scalar()) {
-      const auto& mask_scalar = mask.scalar_as<BooleanScalar>();
-      if (mask_scalar.is_valid && mask_scalar.value) {
-        // All rows passed, nothing left to do
-        return Status::OK();
+#if ARROW_LITTLE_ENDIAN
+       const auto& mask_scalar = mask.scalar_as<BooleanScalar>();
+       if (mask_scalar.is_valid && mask_scalar.value) {
+         // All rows passed, nothing left to do
+         return Status::OK();
+#else
+      // Check if the scalar is a BooleanScalar before casting
+      if (mask.scalar()->type->id() == Type::BOOL) {

Review Comment:
   OK, mystery solved: there are two implementations of hash join - regular and 
swiss. The regular one has wider range of supported types and platforms 
(including big-endian). The swiss one is more efficient, but restrictive in 
supported types and platform (only little-endian). So on big endian platform, 
the regular one is chosen which is more strict on filter expression type. 
Little-endian platform uses the swiss one which is less strict on filter 
expression type by explicitly allowing a null-type `null` - so our CI, always 
ran on little-endian platforms, has always been passing.
   
   I'm the author of supporting residual filter in swiss join as well as the 
tests. I think allowing null-type `null` was a mistake (it was implemented at 
the beginning of my contribution to Arrow).
   
   I think I can send out an issue/pr soon to address the issues discovered 
here:
   1. The test in question needs to be revised by replacing null-type `null` 
with boolean `null`, as mentioned in my previous comment (so it will pass on 
big-endian platforms);
   2. The swiss join code need to be updated by not allowing null-type.
   3. Add `DCHECK`s to enforce the filter expression checking. (optional)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to