Is there a good way to handle the following scenario: I have certain terms with embedded periods for which I want to leave them intact (not split at the periods). For example in my application a particular skill might be SAP.FIN (SAP financial), and it should not be split into SAP and FIN. Is there a way to specify a list of terms such as these which should not be split? I am currently using my own "SynonymAnalyzer" for which the token stream looks like below (pretty standard I think) and where engine is a custom SynonymEngine where I provide the synonyms. Is there a typical way to handle this situation?
public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new SnowballFilter( new SynonymFilter( new StopFilter( new LowerCaseFilter( new StandardFilter( new StandardTokenizer(reader))), StandardAnalyzer.STOP_WORDS), engine),"English" ); return result; } Donna L. Gresh Services Research, Mathematical Sciences Department IBM T.J. Watson Research Center (914) 945-2472 http://www.research.ibm.com/people/g/donnagresh [EMAIL PROTECTED]