waynexia commented on code in PR #10765:
URL: https://github.com/apache/datafusion/pull/10765#discussion_r1630529314
##########
datafusion/physical-plan/src/aggregates/mod.rs:
##########
@@ -1545,8 +1545,13 @@ mod tests {
input_schema,
)?);
- let result =
- common::collect(merged_aggregate.execute(0,
task_ctx.clone())?).await?;
+ // enlarge memory limit in spill mode
+ let task_ctx = if spill {
+ new_spill_ctx(2, 2600)
Review Comment:
This change has two stages to my understanding. The first partial aggr stage
has a smaller limit, to meet the expectation that the partial plan has early
emit.
https://github.com/apache/datafusion/blob/6846513616608fa34e2aa92495df9f879bbd063a/datafusion/physical-plan/src/aggregates/mod.rs#L1504-L1516
And the new lines enlarge the limit after that "early emit" check. Otherwise
the merge aggr would fall because of insufficient memory. (at line 1554)
```rust
let result = common::collect(merged_aggregate.execute(0, task_ctx)?).await?;
```
The root cause of this change is somehow the memory requirement becomes
larger. From my investigation, the biggest change compared to the previous is
from the `RawTable` in `GroupValuesPrimitive` -- it needs about 1000 bytes more:
https://github.com/apache/datafusion/blob/6846513616608fa34e2aa92495df9f879bbd063a/datafusion/physical-plan/src/aggregates/group_values/primitive.rs#L81-L95
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]