Dandandan commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-888991616
> Nice write-up and very interesting discussions! > > * By feeding the signature as a key to the `HashMap`, are we not hashing the original key twice? I guess this can easily be solved by setting the identity function instead of the default hasher on the `HashMap` 😃 Yes, that's also what we currently do for the hash join algorithm. It's a small performance win. It also avoids the higher re-hashing cost when growing the hashmap. The cost of hashing `u64` was already way smaller though than having a complex nested key. I believe a hashmap could also implemented manually using a `Vec` and a number of buckets, when I tried it was slower, I think as the HashMap itselfs is quite fast for collision checks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org