zhuqi-lucas opened a new issue, #21440:
URL: https://github.com/apache/datafusion/issues/21440

   **Is your feature request related to a problem or challenge?**
   
   #21426 introduced a configurable fixed-size `BufferExec` capacity (default 
1GB) for sort pushdown. While this is better than the `SortExec` it replaces 
(which buffers the entire partition), a fixed size is not optimal for all cases:
   
   - **Wide rows** (many columns, large strings): 1GB might not be enough row 
groups
   - **Narrow rows** (few small columns): 1GB buffers far more data than needed
   
   As noted by @alamb in 
https://github.com/apache/datafusion/pull/21426#discussion_r3045025814:
   
   > I suspect a better solution than a fixed size buffer would be some 
calculation based on the actual size of the data (e.g. the number of rows to 
buffer). However, that is tricky to compute / constrain memory when large 
strings are involved. We probably would need to have both a row limit and a 
memory cap and pick the smaller of the two.
   
   **Describe the solution you'd like**
   
   Replace the fixed capacity with a dual-limit approach:
   
   ```
   BufferExec stops buffering when EITHER limit is reached:
     - Row limit: e.g., 100K rows (prevents over-buffering narrow rows)
     - Memory cap: e.g., 1GB (prevents OOM for wide rows)
   ```
   
   This adapts to different row widths automatically:
   - Narrow rows (100 bytes/row): row limit triggers at ~10MB
   - Wide rows (10KB/row): memory cap triggers at 1GB
   
   **Related issues:**
   - #21426 — Make BufferExec capacity configurable (current fixed-size 
approach)
   - #21417 — Original issue for configurable buffer
   - #21182 — Sort pushdown phase 2 (introduced BufferExec)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to