adragomir opened a new issue, #10978: URL: https://github.com/apache/datafusion/issues/10978
### Describe the bug We ran into problems with projections inside HashJoin. Each schema in the join (left / right) has: * a single struct column * and the join column (reference to a get_field inside the first column) The projection is `[0, 2]` - the struct column from left, and the struct column from right The join column is not specified in the output. When trying to optimize the join and reverse the order, the projection is swapped as `[2, 0]`, however there is no column with index 2 in the output, as the output contains only the 2 structs ### To Reproduce * Create two schemas with a single struct column `(key, value)` * Join on the `key` * request the two `value` fields ### Expected behavior The hash join optimization works, even when swapping the join order (and wrapping in a ProjectionExec) ### Additional context Reading the [comment for HashJoinExec::projection](https://github.com/apache/datafusion/blob/ac161bba336d098eab46f666af4664de7e8cd29f/datafusion/physical-plan/src/joins/hash_join.rs#L318) it says `The projection indices of the columns in the output schema of join`, however * inside the `try_new` it seems to be [checked against the join schema](https://github.com/apache/datafusion/blob/ac161bba336d098eab46f666af4664de7e8cd29f/datafusion/physical-plan/src/joins/hash_join.rs#L363) * and inside the `with_projection` it seems to be [checked against the output schema](https://github.com/apache/datafusion/blob/ac161bba336d098eab46f666af4664de7e8cd29f/datafusion/physical-plan/src/joins/hash_join.rs#L453) * It also seems to be treated as relative to the join schema [inside the `swap_join_projection` function](https://github.com/apache/datafusion/blob/ac161bba336d098eab46f666af4664de7e8cd29f/datafusion/core/src/physical_optimizer/join_selection.rs#L140) - as it uses the left and right schemas I tried taking a stab at it, but it's unclear what the meaning of what is passed in projections is. For now, I am fixing it surgically when swapping the order - I am rewriting the projections to be relative to the output schema when [wrapping the join with a `ProjectionExec`](https://github.com/apache/datafusion/blob/ac161bba336d098eab46f666af4664de7e8cd29f/datafusion/core/src/physical_optimizer/join_selection.rs#L196) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org