[ 
https://issues.apache.org/jira/browse/LUCENE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5604:
---------------------------------------

    Attachment: LUCENE-5604.patch

Initial patch, Lucene tests pass, but solrj doesn't yet compile....

I factored out Hash.murmurhash3_x86_32 from Solr into Lucene's StringHelper, 
and cut over BytesRef.hash, TermToBytesRefAttribute.fillBytesRef, and 
BytesRefHash.

I left some nocommits: I think we should change TermToBytesRefAttribute to not 
return this hashCode?  And also remove the BytesRefHash.add method that takes a 
hashCode?  Seems awkward to make the hash code impl of BytesRefHash so public 
... it should be under the hood.

I also randomized/salted the hash seed per JVM instance (poached this from 
Guava), by setting a common static seed on JVM init (just 
System.currentTimeMillis()).  This should frustrate denial of service attacks, 
and also can catch any places where we rely on this hash function not changing 
across JVM instances (e.g. persisting to disk somewhere).


> Should we switch BytesRefHash to MurmurHash3?
> ---------------------------------------------
>
>                 Key: LUCENE-5604
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5604
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.9, 5.0
>
>         Attachments: LUCENE-5604.patch
>
>
> MurmurHash3 has better hashing distribution than the current hash function we 
> use for BytesRefHash which is a simple multiplicative function with 31 
> multiplier (same as Java's String.hashCode, but applied to bytes not chars).  
> Maybe we should switch ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to