Rich-T-kid commented on code in PR #23262:
URL: https://github.com/apache/datafusion/pull/23262#discussion_r3508191354
##########
datafusion/physical-plan/src/aggregates/group_values/multi_group_by/mod.rs:
##########
@@ -1117,7 +1117,7 @@ impl<const STREAMING: bool> GroupValues for
GroupValuesColumn<STREAMING> {
// a real Result rather than panicking.
let fresh = Self::build_group_columns(&self.schema)?;
let group_values = mem::replace(&mut self.group_values, fresh);
-
+ self.map.clear();
Review Comment:
`EmitTo::All` gets called in two cases
1. memory is filling up, leading to a spill. `EmitTo::All` is called
[here](https://github.com/apache/datafusion/blob/9e8dd76d6deb6736c51962d9c97e04be4e3f1fc9/datafusion/physical-plan/src/aggregates/row_hash.rs#L1173)
and then its written to disk. after `EmitTo::All` is called
`GroupValues::clear_shrink()` gets
[called](https://github.com/apache/datafusion/blob/9e8dd76d6deb6736c51962d9c97e04be4e3f1fc9/datafusion/physical-plan/src/aggregates/row_hash.rs#L1182)
and thats when the map content gets cleared. iirc `EmitTo::All` can be called
multiple times for spills. This wont matter as each call is paired with
`GroupValues::clear_shrink()`
2. all input has been seen, begin producing output. `EmitTo::all` is called
[here](https://github.com/apache/datafusion/blob/9e8dd76d6deb6736c51962d9c97e04be4e3f1fc9/datafusion/physical-plan/src/aggregates/row_hash.rs#L1276)
is called to produced all of the arrays and then gets repeatedly sliced. After
this the `exec_state` is set to [producing
output](https://github.com/apache/datafusion/blob/9e8dd76d6deb6736c51962d9c97e04be4e3f1fc9/datafusion/physical-plan/src/aggregates/row_hash.rs#L1279)
so intern is never called again.
Currently in both cases due to the ordering of surrounding code the state of
`groupValuesColumns` is logically correct.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]