alamb commented on code in PR #19910:
URL: https://github.com/apache/datafusion/pull/19910#discussion_r2722561289


##########
datafusion/physical-expr-common/src/binary_map.rs:
##########
@@ -389,7 +389,7 @@ where
                 // is value is already present in the set?
                 let entry = self.map.find_mut(hash, |header| {
                     // compare value if hashes match
-                    if header.len != value_len {
+                    if header.hash != hash {

Review Comment:
   I am double checking the reasoning, if the hash values aren't the same we 
know the values can not be the same either due to the requirements of the Hash 
trait
   
   However (as codex points out to me) there is some small subtle chance that 
if two different values collide on hash, **and their inline encodings match**, 
they will be treated as equal even though the byte sequences differ which is 
what the old len check avoided
   
   So I think we still need the old length checks too 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to