[ https://issues.apache.org/jira/browse/DRILL-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673860#comment-16673860 ]
Boaz Ben-Zvi commented on DRILL-6825: ------------------------------------- We were talking a while back about changing the use of hash functions, instead of generating code – make a virtual call that computes the hash value for each type of vector (similar to the `copyEntry()` in the `ValueVector`). And then compute the hash value by iterating over the key columns (similar to `appendRow()` in `VectorContainer` - though need to know which columns belong to the key). Also this would remove the hash value computation from the HashTable. Don't remember if a Jira was opened for that work. This would definitely simplify using different hash functions, per each datatype. One last point - may need to keep various integers hashing compatibility - so best if `HashValue(X as smallIint) == HashValue(X as int) == HashValue(X as bigint)` > Applying different hash function according to data types and data size > ---------------------------------------------------------------------- > > Key: DRILL-6825 > URL: https://issues.apache.org/jira/browse/DRILL-6825 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Codegen > Reporter: weijie.tong > Priority: Major > Fix For: 1.16.0 > > > Different hash functions have different performance according to different > data types and data size. We should choose a right one to apply not just > Murmurhash. -- This message was sent by Atlassian JIRA (v7.6.3#76005)