[
https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968122#comment-13968122
]
Adrien Grand commented on LUCENE-5604:
--------------------------------------
Strong +1 on this change!
bq. Separately, I also tried a different probing function inside BytesRefHash
I'm wondering if we should try linear probing? Now that we use a good hash
function, the likelyness of having clusters of hashes in the hash table is much
lower (especially given that BytesRefHash hard-codes quite a low load factor:
0.5) so linear probing might help get some performance back since it tends to
be more cache-friendly?
bq. I added a small test case, confirming our MurmurHash3 impl matches a
separate Python/C impl I found
Maybe we could add Guava as a test dependency and do some duels on random bytes?
> Should we switch BytesRefHash to MurmurHash3?
> ---------------------------------------------
>
> Key: LUCENE-5604
> URL: https://issues.apache.org/jira/browse/LUCENE-5604
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: BytesRefHash.perturb.patch, LUCENE-5604.patch,
> LUCENE-5604.patch
>
>
> MurmurHash3 has better hashing distribution than the current hash function we
> use for BytesRefHash which is a simple multiplicative function with 31
> multiplier (same as Java's String.hashCode, but applied to bytes not chars).
> Maybe we should switch ...
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]