jonathanc-n commented on issue #17267: URL: https://github.com/apache/datafusion/issues/17267#issuecomment-3209074353
Yes that is a bit difficult as well, we don't know size up front and usually the number of partitions is based on the memory size and memory limit. It will just be a configurable variable. > I recommend to get sort merge join working reliably before experimenting HJ spilling (i.e. benchmarks should be able to finish under a modest memory limit, perhaps also more tests), the existing solution is not production ready yet, but I think SMJ should have lower maintenance overhead -- It's core is reusing the external sort implementation. Is there a reason we shouldn't do it in parallel? I will only start after #17260 is finished. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
