xinyuezg commented on PR #8965:
URL: 
https://github.com/apache/incubator-gluten/pull/8965#issuecomment-3335083764

   We’ve been investigating correctness issues in `BroadcastNestedLoopJoin` 
with FULL OUTER JOIN, and traced the root cause back to this PR (#8965).
   
   This change assumes that when the join condition is empty, Velox can skip 
emitting build-side mismatches because there’s “no need.” That assumption works 
in Presto/Velox where probe and build run in a single worker, but it does not 
hold in Spark:
        •       In Spark, the probe side is distributed across multiple 
executors.
        •       **Velox cannot tell globally whether the probe side is empty or 
not, since each driver only sees its own partition.**
        •       As a result, with FULL OUTER JOIN and empty join condition, 
build-side mismatches may be suppressed incorrectly.
        •       This leads to wrong results compared to Spark’s native behavior 
(where a repartition(1) is used to guarantee correct global build-mismatch 
handling).
   
   We validated this both in our environment and by comparing against Spark’s 
implementation. The conclusion is that Velox cannot currently produce correct 
results for Spark in this case.
   
   Proposed resolution:
   We believe the correct fix is to fall back to Spark for all FULL OUTER JOIN 
cases in BNLJ, regardless of whether a join condition is specified. This 
guarantees correctness.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to