separate chararrayset interface from impl
-----------------------------------------
Key: LUCENE-2227
URL: https://issues.apache.org/jira/browse/LUCENE-2227
Project: Lucene - Java
Issue Type: Task
Components: Analysis
Affects Versions: 3.0
Reporter: Robert Muir
Priority: Minor
CharArraySet should be abstract
the hashing implementation currently being used should instead be called
CharArrayHashSet
currently our 'CharArrayHashSet' is hardcoded across Lucene, but others might
want their own impl.
For example, implementing CharArraySet as DFA with
org.apache.lucene.util.automaton gives faster contains(char[], int, int)
performance, as it can do a 'fast fail' and need not hash the entire string.
This is useful as it speeds up indexing in StopFilter.
I did not think this would be faster but i did benchmarks over and over with
the reuters corpus, and it is, even with english text's wierd average word
length of 5
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]