Re: Dealing with special cases in analyser

Paul Taylor Wed, 17 Mar 2010 23:50:40 -0700

Grant Ingersoll wrote:

On Mar 17, 2010, at 11:34 AM, Paul Taylor wrote:

Grant Ingersoll wrote:

What's your current chain of TokenFilters?  How many exceptions do you expect?  
That is, could you enumerate them?

Very few, yes I could enumerate them, but not sure what exactly you are 
suggesting, what I was going to do would be add to the charConvertMap (when I 
posted I thought this was only for individual chars not strings)


You could have modify whichever filter is removing them to take in a protected 
words list and then short circuit to not remove that token.  This would be a 
hash map lookup, which should be faster than the char replacement you are 
considering. Many of the stemmers do this.

Hmm, they are removed by the tokenizer not a filter because they arepunctuation chars, I suppose I could try and modify the jflex file


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Dealing with special cases in analyser

Reply via email to