Omega359 commented on PR #18589: URL: https://github.com/apache/datafusion/pull/18589#issuecomment-3538860250
> Thank you for working on it. I think memory-limited equal join is a solved problem through sort merge join, though there are some outstanding work to further improve it. I am concerned about your focus solely on performance here. I don't care how fast an algo is if I need a machine with 512GB+ of ram to run a join. DF runs out of memory a lot for my coworkers for joins to the point that one of my coworkers wrote our own disk based external join because neither DF nor DuckDB could do the join without OOM. While grace hash join may potentially be slower overall because of the writing to disk, possible re-partitioning if the initial # of partitions was too low, etc, it also is pretty much guaranteed to not require large amounts of memory no matter what the shape of the left and ride sides of the join are. Benchmarks of course would be awesome to have, no disagreement there. However, I would add that memory usage would be just as important a factor as performance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
