[
https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-5604:
---------------------------------------
Attachment: LUCENE-5604.patch
Initial patch, Lucene tests pass, but solrj doesn't yet compile....
I factored out Hash.murmurhash3_x86_32 from Solr into Lucene's StringHelper,
and cut over BytesRef.hash, TermToBytesRefAttribute.fillBytesRef, and
BytesRefHash.
I left some nocommits: I think we should change TermToBytesRefAttribute to not
return this hashCode? And also remove the BytesRefHash.add method that takes a
hashCode? Seems awkward to make the hash code impl of BytesRefHash so public
... it should be under the hood.
I also randomized/salted the hash seed per JVM instance (poached this from
Guava), by setting a common static seed on JVM init (just
System.currentTimeMillis()). This should frustrate denial of service attacks,
and also can catch any places where we rely on this hash function not changing
across JVM instances (e.g. persisting to disk somewhere).
> Should we switch BytesRefHash to MurmurHash3?
> ---------------------------------------------
>
> Key: LUCENE-5604
> URL: https://issues.apache.org/jira/browse/LUCENE-5604
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5604.patch
>
>
> MurmurHash3 has better hashing distribution than the current hash function we
> use for BytesRefHash which is a simple multiplicative function with 31
> multiplier (same as Java's String.hashCode, but applied to bytes not chars).
> Maybe we should switch ...
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]