Dandandan commented on code in PR #12996: URL: https://github.com/apache/datafusion/pull/12996#discussion_r1823150363
########## datafusion/physical-plan/src/aggregates/group_values/column.rs: ########## @@ -125,6 +233,292 @@ impl GroupValuesColumn { | DataType::BinaryView ) } + + /// Collect vectorized context by checking hash values of `cols` in `map` + /// + /// 1. If bucket not found + /// - Build and insert the `new inlined group index view` + /// and its hash value to `map` + /// - Add row index to `vectorized_append_row_indices` + /// - Set group index to row in `groups` + /// + /// 2. bucket found + /// - Add row index to `vectorized_equal_to_row_indices` + /// - Check if the `group index view` is `inlined` or `non_inlined`: + /// If it is inlined, add to `vectorized_equal_to_group_indices` directly. + /// Otherwise get all group indices from `group_index_lists`, and add them. + /// + fn collect_vectorized_process_context( + &mut self, + batch_hashes: &[u64], + groups: &mut Vec<usize>, + ) { + self.vectorized_append_row_indices.clear(); + self.vectorized_equal_to_row_indices.clear(); + self.vectorized_equal_to_group_indices.clear(); + + let mut group_values_len = self.group_values[0].len(); + for (row, &target_hash) in batch_hashes.iter().enumerate() { + let entry = self.map.get(target_hash, |(exist_hash, _)| { Review Comment: Probably it's not a very hot path at the moment, was also surprised to see for hash join that other parts (e.g. traversal of matching rows / equality check) are much more expensive than the lookup of the map. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org