jayzhan211 commented on code in PR #12996: URL: https://github.com/apache/datafusion/pull/12996#discussion_r1823718252
########## datafusion/physical-plan/src/aggregates/group_values/column.rs: ########## @@ -196,6 +570,324 @@ impl GroupValues for GroupValuesColumn { let b = ByteViewGroupValueBuilder::<BinaryViewType>::new(); v.push(Box::new(b) as _) } + dt => { + return not_impl_err!( + "{dt} not supported in VectorizedGroupValuesColumn" + ) + } + } + } + self.group_values = v; + } + + // tracks to which group each of the input rows belongs + groups.clear(); + groups.resize(n_rows, usize::MAX); + + let mut batch_hashes = mem::take(&mut self.hashes_buffer); + batch_hashes.clear(); + batch_hashes.resize(n_rows, 0); + create_hashes(cols, &self.random_state, &mut batch_hashes)?; + + // General steps for one round `vectorized equal_to & append`: + // 1. Collect vectorized context by checking hash values of `cols` in `map`, + // mainly fill `vectorized_append_row_indices`, `vectorized_equal_to_row_indices` + // and `vectorized_equal_to_group_indices` + // + // 2. Perform `vectorized_append` for `vectorized_append_row_indices`. + // `vectorized_append` must be performed before `vectorized_equal_to`, + // because some `group indices` in `vectorized_equal_to_group_indices` + // may be actually placeholders, and still point to no actual values in Review Comment: I'm unclear on the meaning of the placeholders here, but I understand why we need to append values first. Groups that require an equality check may need to compare against the newly added groups. In collect_vectorized_process_context, new groups only update the hash in the hash table without appending values yet. After checking the hash, rows are allocated to groups requiring an equality check. Therefore, we need to append the new groups first so that equality checks can include the newly appended groups -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org