Grant Ingersoll wrote:
On Mar 17, 2010, at 11:34 AM, Paul Taylor wrote:

Grant Ingersoll wrote:
What's your current chain of TokenFilters?  How many exceptions do you expect?  
That is, could you enumerate them?
Very few, yes I could enumerate them, but not sure what exactly you are 
suggesting, what I was going to do would be add to the charConvertMap (when I 
posted I thought this was only for individual chars not strings)

You could have modify whichever filter is removing them to take in a protected 
words list and then short circuit to not remove that token.  This would be a 
hash map lookup, which should be faster than the char replacement you are 
considering. Many of the stemmers do this.


Hmm, they are removed by the tokenizer not a filter because they are punctuation chars, I suppose I could try and modify the jflex file

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to