2010YOUY01 commented on PR #18589: URL: https://github.com/apache/datafusion/pull/18589#issuecomment-3514582015
Thank you for working on it. I think memory-limited equal join is a solved problem through sort merge join, though there are some outstanding work to further improve it. Grace Hash Join is trickier to configure because we have to decide partition number up front, and it's not obvious to me it's more efficient then SMJ after the spilling process. I suggest the first step to be implementing a comprehensive benchmark for external joins, then demonstrate GHJ has a clear performance win over SMJ, and next we can carry out this project. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
