alamb commented on code in PR #11718:
URL: https://github.com/apache/datafusion/pull/11718#discussion_r1696791284
##########
datafusion/physical-plan/src/aggregates/group_values/row.rs:
##########
@@ -120,12 +120,13 @@ impl GroupValues for GroupValuesRows {
batch_hashes.resize(n_rows, 0);
create_hashes(cols, &self.random_state, batch_hashes)?;
- for (row, &hash) in batch_hashes.iter().enumerate() {
- let entry = self.map.get_mut(hash, |(_hash, group_idx)| {
+ for (row, &target_hash) in batch_hashes.iter().enumerate() {
+ let entry = self.map.get_mut(target_hash, |(exist_hash,
group_idx)| {
// verify that a group that we are inserting with hash is
// actually the same key value as the group in
// existing_idx (aka group_values @ row)
- group_rows.row(row) == group_values.row(*group_idx)
+ target_hash == *exist_hash
Review Comment:
I was curious -- I think the code @Rachelint is referring to in hashbrown is
here
https://github.com/rust-lang/hashbrown/blob/ac00a0bbef46f02f555e235f57ce263aefa361e0/src/raw/mod.rs#L2183-L2199
It does appear that collisions could happen (it is doing some sort of
abbreviated check by condensing the actual hash value to a byte or something),
though I don't fully understand how it works.
I wonder if there are other places we could use this observation 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]