zanmato1984 commented on code in PR #48166:
URL: https://github.com/apache/arrow/pull/48166#discussion_r2557768858
##########
cpp/src/arrow/acero/hash_join.cc:
##########
@@ -306,12 +307,33 @@ class HashJoinBasicImpl : public HashJoinImpl {
size_t num_probed_rows = match.size() + no_match.size();
if (mask.is_scalar()) {
- const auto& mask_scalar = mask.scalar_as<BooleanScalar>();
- if (mask_scalar.is_valid && mask_scalar.value) {
- // All rows passed, nothing left to do
- return Status::OK();
+#if ARROW_LITTLE_ENDIAN
+ const auto& mask_scalar = mask.scalar_as<BooleanScalar>();
+ if (mask_scalar.is_valid && mask_scalar.value) {
+ // All rows passed, nothing left to do
+ return Status::OK();
+#else
+ // Check if the scalar is a BooleanScalar before casting
+ if (mask.scalar()->type->id() == Type::BOOL) {
Review Comment:
This is a very detailed explanation. Thanks @Vishwanatha-HD .
As the author of the questioning test, I can say this is the issue of the
test rather than of the hash join code. That is, using an filter expression
evaluating to null-type `null` is invalid - it should be a boolean `null`. The
hash join code itself arbitrarily assuming the expression being `boolean` is
OK, though a `DCHECK_EQ(mask.type()->id(), Type::BOOL)` would be more
preferable.
I think we should in turn fix the test by simply replacing the
`literal(NullScalar())` with a boolean `null` -
`literal(MakeNullScalar(boolean()))`
##########
cpp/src/arrow/acero/hash_join.cc:
##########
@@ -306,12 +307,33 @@ class HashJoinBasicImpl : public HashJoinImpl {
size_t num_probed_rows = match.size() + no_match.size();
if (mask.is_scalar()) {
- const auto& mask_scalar = mask.scalar_as<BooleanScalar>();
- if (mask_scalar.is_valid && mask_scalar.value) {
- // All rows passed, nothing left to do
- return Status::OK();
+#if ARROW_LITTLE_ENDIAN
+ const auto& mask_scalar = mask.scalar_as<BooleanScalar>();
+ if (mask_scalar.is_valid && mask_scalar.value) {
+ // All rows passed, nothing left to do
+ return Status::OK();
+#else
+ // Check if the scalar is a BooleanScalar before casting
+ if (mask.scalar()->type->id() == Type::BOOL) {
Review Comment:
But one question remains: The test should be equally problematic for
little-endian as well. Why is it passing?
I'm now looking into it.
##########
cpp/src/arrow/acero/hash_join.cc:
##########
@@ -306,19 +307,40 @@ class HashJoinBasicImpl : public HashJoinImpl {
size_t num_probed_rows = match.size() + no_match.size();
if (mask.is_scalar()) {
+#if ARROW_LITTLE_ENDIAN
const auto& mask_scalar = mask.scalar_as<BooleanScalar>();
if (mask_scalar.is_valid && mask_scalar.value) {
// All rows passed, nothing left to do
return Status::OK();
- } else {
- // Nothing passed, no_match becomes everything
- no_match.resize(num_probed_rows);
- std::iota(no_match.begin(), no_match.end(), 0);
- match_left.clear();
- match_right.clear();
- match.clear();
- return Status::OK();
}
+#else
+ // Check if the scalar is a BooleanScalar before casting
Review Comment:
Explained in my other comment down below.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]