adriangb commented on code in PR #16641:
URL: https://github.com/apache/datafusion/pull/16641#discussion_r2179100056
##########
datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs:
##########
@@ -668,6 +668,15 @@ fn handle_hash_join(
plan: &HashJoinExec,
parent_required: OrderingRequirements,
) -> Result<Option<Vec<Option<OrderingRequirements>>>> {
+ // Anti-joins (LeftAnti or RightAnti) do not preserve meaningful input
order,
+ // so sorting beforehand cannot be relied on. Bail out early for both
flavors:
+ match plan.join_type() {
+ JoinType::LeftAnti | JoinType::RightAnti => {
+ return Ok(None);
+ }
+ _ => {}
+ }
Review Comment:
It makes sense that in that example no global order is guaranteed in the
output, there was no sorting being done before the anti-join! I think if you
pushed a SortExec you'd get the right order.
Having thought about it a bit I do think this is a real bug that needs
fixing, but I'm not sure it's about sorting, it's about limits: if you push the
TopK below the anti join is the limit is applied before the anti join filters
out rows, which means you'll end up getting less or the wrong rows out. For
example, if the batch into the TopK(2) is
| a | b | c |
|---|---|---|
| 2 | 3 | 5 |
| 1 | 1 | 2 |
| 2 | 4 | 1 |
It will output
| a | b | c |
|---|---|---|
| 1 | 1 | 2 |
| 2 | 3 | 5 |
But then if the anti join filters out `c = 5` we get:
| a | b | c |
|---|---|---|
| 1 | 1 | 2 |
Which is a single row despite the `LIMIT 2`.
If instead we apply the anti-join first we get:
| a | b | c |
|---|---|---|
| 1 | 1 | 2 |
| 2 | 4 | 1 |
Which is correct.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]