2010YOUY01 commented on PR #15700: URL: https://github.com/apache/datafusion/pull/15700#issuecomment-3034804482
https://github.com/apache/datafusion/pull/15700#discussion_r2041372025 I have a idea to fix this concern: adding a max merge degree configuration, if either a. SPM's estimated memory exceed budget b. configured max merge degree has reached do a re-spill. This approach I think has two advantages: 1. If batch size bloat happens after spill and read back roundtrip (see https://github.com/apache/datafusion/pull/15700#discussion_r2041372025), if there is a hard merge degree limit to override the estimation, query can still finish. 2. Also helpful to tune for speed: even we have enough memory to perform a very wide merge, limiting it to a smaller merge is still likely to run faster. I (or possibly @ding-young) can handle this patch in a follow-up PR. I think we can move forward with this one—I’ll review it in the next few days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org