LiaCastaneda commented on PR #19761: URL: https://github.com/apache/datafusion/pull/19761#issuecomment-3935106429
👋 Something I noticed while using `BufferExec` in our service is that when the build side of a `HashJoinExec` (`INNER` join) returns 0 rows, the probe side is still fully consumed. IIUC The short-circuit [here](https://github.com/apache/datafusion/blob/ace9cd44b7356d60e6d69d0b98ac3f5606d55507/datafusion/physical-plan/src/joins/hash_join/stream.rs#L647) only skips the hash join lookup work, but `fetch_probe_batch` still runs for every probe batch until the stream is exhausted. I think this happens regardless there is a `BufferExec` or not Would it make sense to detect an empty build side right after `collect_build_side` completes, and for join types where empty build --> empty output , drop the probe stream immediately and jump to `Completed`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
