[jira] [Commented] (HIVE-16592) Vectorization: Long hashes use hash64shift and not hash6432shift to generate int hashCodes

2017-05-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001358#comment-16001358
 ] 

Sergey Shelukhin commented on HIVE-16592:
-

+1

> Vectorization: Long hashes use hash64shift and not hash6432shift to generate 
> int hashCodes
> --
>
> Key: HIVE-16592
> URL: https://issues.apache.org/jira/browse/HIVE-16592
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16592.1.patch
>
>
> {code}
> public static int calculateLongHashCode(long key) {
> key = (~key) + (key << 21); // key = (key << 21) - key - 1;
> key = key ^ (key >>> 24);
> key = (key + (key << 3)) + (key << 8); // key * 265
> key = key ^ (key >>> 14);
> key = (key + (key << 2)) + (key << 4); // key * 21
> key = key ^ (key >>> 28);
> key = key + (key << 31);
> return (int) key;
>   }
> {code}
> Does not mix enough bits into the lower 32 bits, which are used for the 
> bucket probes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16592) Vectorization: Long hashes use hash64shift and not hash6432shift to generate int hashCodes

2017-05-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999787#comment-15999787
 ] 

Hive QA commented on HIVE-16592:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12866760/HIVE-16592.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10652 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_binary_external_table_queries]
 (batchId=93)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5098/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5098/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5098/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12866760 - PreCommit-HIVE-Build

> Vectorization: Long hashes use hash64shift and not hash6432shift to generate 
> int hashCodes
> --
>
> Key: HIVE-16592
> URL: https://issues.apache.org/jira/browse/HIVE-16592
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-16592.1.patch
>
>
> {code}
> public static int calculateLongHashCode(long key) {
> key = (~key) + (key << 21); // key = (key << 21) - key - 1;
> key = key ^ (key >>> 24);
> key = (key + (key << 3)) + (key << 8); // key * 265
> key = key ^ (key >>> 14);
> key = (key + (key << 2)) + (key << 4); // key * 21
> key = key ^ (key >>> 28);
> key = key + (key << 31);
> return (int) key;
>   }
> {code}
> Does not mix enough bits into the lower 32 bits, which are used for the 
> bucket probes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16592) Vectorization: Long hashes use hash64shift and not hash6432shift to generate int hashCodes

2017-05-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999733#comment-15999733
 ] 

Gopal V commented on HIVE-16592:


The best hash function in my tests for the working set I use has been 

{code}
  key ^= (key >>> 20) ^ (key >>> 12);
  key ^= (key >>> 7) ^ (key >>> 4);
  return (int)key;
{code}

Unlike the hash64shift, this isn't reversible.

> Vectorization: Long hashes use hash64shift and not hash6432shift to generate 
> int hashCodes
> --
>
> Key: HIVE-16592
> URL: https://issues.apache.org/jira/browse/HIVE-16592
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Priority: Minor
>
> {code}
> public static int calculateLongHashCode(long key) {
> key = (~key) + (key << 21); // key = (key << 21) - key - 1;
> key = key ^ (key >>> 24);
> key = (key + (key << 3)) + (key << 8); // key * 265
> key = key ^ (key >>> 14);
> key = (key + (key << 2)) + (key << 4); // key * 21
> key = key ^ (key >>> 28);
> key = key + (key << 31);
> return (int) key;
>   }
> {code}
> Does not mix enough bits into the lower 32 bits, which are used for the 
> bucket probes.
> The 1997 document lists 
> {code}
> public int hash6432shift(long key)
> {
>   key = (~key) + (key << 18); // key = (key << 18) - key - 1;
>   key = key ^ (key >>> 31);
>   key = key * 21; // key = (key + (key << 2)) + (key << 4);
>   key = key ^ (key >>> 11);
>   key = key + (key << 6);
>   key = key ^ (key >>> 22);
>   return (int) key;
> }
> {code}
> as the algorithm for keeping the lower 32 bits well distributed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)