[jira] [Commented] (HIVE-16592) Vectorization: Long hashes use hash64shift and not hash6432shift to generate int hashCodes
[ https://issues.apache.org/jira/browse/HIVE-16592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001358#comment-16001358 ] Sergey Shelukhin commented on HIVE-16592: - +1 > Vectorization: Long hashes use hash64shift and not hash6432shift to generate > int hashCodes > -- > > Key: HIVE-16592 > URL: https://issues.apache.org/jira/browse/HIVE-16592 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16592.1.patch > > > {code} > public static int calculateLongHashCode(long key) { > key = (~key) + (key << 21); // key = (key << 21) - key - 1; > key = key ^ (key >>> 24); > key = (key + (key << 3)) + (key << 8); // key * 265 > key = key ^ (key >>> 14); > key = (key + (key << 2)) + (key << 4); // key * 21 > key = key ^ (key >>> 28); > key = key + (key << 31); > return (int) key; > } > {code} > Does not mix enough bits into the lower 32 bits, which are used for the > bucket probes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16592) Vectorization: Long hashes use hash64shift and not hash6432shift to generate int hashCodes
[ https://issues.apache.org/jira/browse/HIVE-16592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999787#comment-15999787 ] Hive QA commented on HIVE-16592: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12866760/HIVE-16592.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10652 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_binary_external_table_queries] (batchId=93) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5098/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5098/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5098/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12866760 - PreCommit-HIVE-Build > Vectorization: Long hashes use hash64shift and not hash6432shift to generate > int hashCodes > -- > > Key: HIVE-16592 > URL: https://issues.apache.org/jira/browse/HIVE-16592 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16592.1.patch > > > {code} > public static int calculateLongHashCode(long key) { > key = (~key) + (key << 21); // key = (key << 21) - key - 1; > key = key ^ (key >>> 24); > key = (key + (key << 3)) + (key << 8); // key * 265 > key = key ^ (key >>> 14); > key = (key + (key << 2)) + (key << 4); // key * 21 > key = key ^ (key >>> 28); > key = key + (key << 31); > return (int) key; > } > {code} > Does not mix enough bits into the lower 32 bits, which are used for the > bucket probes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16592) Vectorization: Long hashes use hash64shift and not hash6432shift to generate int hashCodes
[ https://issues.apache.org/jira/browse/HIVE-16592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999733#comment-15999733 ] Gopal V commented on HIVE-16592: The best hash function in my tests for the working set I use has been {code} key ^= (key >>> 20) ^ (key >>> 12); key ^= (key >>> 7) ^ (key >>> 4); return (int)key; {code} Unlike the hash64shift, this isn't reversible. > Vectorization: Long hashes use hash64shift and not hash6432shift to generate > int hashCodes > -- > > Key: HIVE-16592 > URL: https://issues.apache.org/jira/browse/HIVE-16592 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Priority: Minor > > {code} > public static int calculateLongHashCode(long key) { > key = (~key) + (key << 21); // key = (key << 21) - key - 1; > key = key ^ (key >>> 24); > key = (key + (key << 3)) + (key << 8); // key * 265 > key = key ^ (key >>> 14); > key = (key + (key << 2)) + (key << 4); // key * 21 > key = key ^ (key >>> 28); > key = key + (key << 31); > return (int) key; > } > {code} > Does not mix enough bits into the lower 32 bits, which are used for the > bucket probes. > The 1997 document lists > {code} > public int hash6432shift(long key) > { > key = (~key) + (key << 18); // key = (key << 18) - key - 1; > key = key ^ (key >>> 31); > key = key * 21; // key = (key + (key << 2)) + (key << 4); > key = key ^ (key >>> 11); > key = key + (key << 6); > key = key ^ (key >>> 22); > return (int) key; > } > {code} > as the algorithm for keeping the lower 32 bits well distributed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)