2010YOUY01 commented on PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3034804482

   https://github.com/apache/datafusion/pull/15700#discussion_r2041372025
   
   I have a idea to fix this concern: adding a max merge degree configuration, 
if either 
   a. SPM's estimated memory exceed budget 
   b. configured max merge degree has reached
   do a re-spill.
   
   This approach I think has two advantages:
   1. If batch size bloat happens after spill and read back roundtrip (see 
https://github.com/apache/datafusion/pull/15700#discussion_r2041372025), if 
there is a hard merge degree limit to override the estimation, query can still 
finish.
   2. Also helpful to tune for speed: even we have enough memory to perform a 
very wide merge, limiting it to a smaller merge is still likely to run faster.
   
   I (or possibly @ding-young) can handle this patch in a follow-up PR. I think 
we can move forward with this one—I’ll review it in the next few days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to