[ 
https://issues.apache.org/jira/browse/LUCENE-977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-977:
--------------------------------

    Attachment: hash.patch

Here is a patch that adds in 7 new bits (the rightmost bit is destroyed to make 
the number odd) when calculating the incrementor via
              final int inc = ((code>>8)+code)|1;
And thus gives 128 different possible paths to follow *per slot* on the first 
collision on that slot.
Ideally we would shift log2(size_of_hashtable), but it's probably not worth 
calculating that and I chose a small shift so it would work well for small 
hashCodes (say from a very short string).

Given that equals() in these cases is probably pretty fast, the average speedup 
is probably relatively minimal.

Comments?


> internal hashing improvements
> -----------------------------
>
>                 Key: LUCENE-977
>                 URL: https://issues.apache.org/jira/browse/LUCENE-977
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Yonik Seeley
>         Attachments: hash.patch
>
>
> Internal power-of-two closed hashtable traversal in DocumentsWriter and 
> CharArraySet could be better.
> Here is the current method of resolving collisions:
>     if (text2 != null && !equals(text, len, text2)) {
>       final int inc = code*1347|1;
>       do {
>         code += inc;
>         pos = code & mask;
>         text2 = entries[pos];
>       } while (text2 != null && !equals(text, len, text2));
> The problem is that two different hashCodes with the same lower bits will 
> keep picking the same slots (the upper bits will be ignored).
> This is because multiplication (*1347) only really shifts bits to the left... 
> so given that the two codes already matched on the right, they will both pick 
> the same increment, and this will keep them on the same path through the 
> table (even though it's being added to numbers that differ on the left).  To 
> resolve this, some bits need to be moved to the right when calculating the 
> increment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to