zhuqi-lucas opened a new issue, #21440: URL: https://github.com/apache/datafusion/issues/21440
**Is your feature request related to a problem or challenge?** #21426 introduced a configurable fixed-size `BufferExec` capacity (default 1GB) for sort pushdown. While this is better than the `SortExec` it replaces (which buffers the entire partition), a fixed size is not optimal for all cases: - **Wide rows** (many columns, large strings): 1GB might not be enough row groups - **Narrow rows** (few small columns): 1GB buffers far more data than needed As noted by @alamb in https://github.com/apache/datafusion/pull/21426#discussion_r3045025814: > I suspect a better solution than a fixed size buffer would be some calculation based on the actual size of the data (e.g. the number of rows to buffer). However, that is tricky to compute / constrain memory when large strings are involved. We probably would need to have both a row limit and a memory cap and pick the smaller of the two. **Describe the solution you'd like** Replace the fixed capacity with a dual-limit approach: ``` BufferExec stops buffering when EITHER limit is reached: - Row limit: e.g., 100K rows (prevents over-buffering narrow rows) - Memory cap: e.g., 1GB (prevents OOM for wide rows) ``` This adapts to different row widths automatically: - Narrow rows (100 bytes/row): row limit triggers at ~10MB - Wide rows (10KB/row): memory cap triggers at 1GB **Related issues:** - #21426 — Make BufferExec capacity configurable (current fixed-size approach) - #21417 — Original issue for configurable buffer - #21182 — Sort pushdown phase 2 (introduced BufferExec) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
