Grant Ingersoll wrote:
On Mar 17, 2010, at 11:34 AM, Paul Taylor wrote:
Grant Ingersoll wrote:
What's your current chain of TokenFilters? How many exceptions do you expect?
That is, could you enumerate them?
Very few, yes I could enumerate them, but not sure what exactly you are
suggesting, what I was going to do would be add to the charConvertMap (when I
posted I thought this was only for individual chars not strings)
You could have modify whichever filter is removing them to take in a protected
words list and then short circuit to not remove that token. This would be a
hash map lookup, which should be faster than the char replacement you are
considering. Many of the stemmers do this.
Hmm, they are removed by the tokenizer not a filter because they are
punctuation chars, I suppose I could try and modify the jflex file
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org