Dandandan commented on issue #790:
URL: 
https://github.com/apache/arrow-datafusion/issues/790#issuecomment-888991616


   > Nice write-up and very interesting discussions!
   > 
   > * By feeding the signature as a key to the `HashMap`, are we not hashing 
the original key twice? I guess this can easily be solved by setting the 
identity function instead of the default hasher on the `HashMap`  😃
   
   Yes, that's also what we currently do for the hash join algorithm. It's a 
small performance win. It also avoids the higher re-hashing cost when growing 
the hashmap.
   The cost of hashing `u64` was already way smaller though than having a 
complex nested key.
   
   I believe a hashmap could also implemented manually using a `Vec` and a 
number of buckets, when I tried it was slower, I think as the HashMap itselfs 
is quite fast for collision checks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to