jonathanc-n commented on issue #17267:
URL: https://github.com/apache/datafusion/issues/17267#issuecomment-3209074353

   Yes that is a bit difficult as well, we don't know size up front and usually 
the number of partitions is based on the memory size and memory limit. It will 
just be a configurable variable.
   
   > I recommend to get sort merge join working reliably before experimenting 
HJ spilling (i.e. benchmarks should be able to finish under a modest memory 
limit, perhaps also more tests), the existing solution is not production ready 
yet, but I think SMJ should have lower maintenance overhead -- It's core is 
reusing the external sort implementation.
   
   Is there a reason we shouldn't do it in parallel? I will only start after 
#17260 is finished.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to