Is there a good way to handle the following scenario:

I have certain terms with embedded periods for which I want to leave them 
intact (not split at the periods). For 
example in my application a particular skill might be SAP.FIN (SAP 
financial), and it should not be split into
SAP and FIN. Is there a way to specify a list of terms such as these which 
should not be split? I am 
currently using my own "SynonymAnalyzer" for which the token stream looks 
like below
 (pretty standard I think) and where engine is a custom SynonymEngine 
where I provide the synonyms. 
Is there a typical way to handle this situation?

public TokenStream tokenStream(String fieldName, Reader reader) {
 
TokenStream result = new SnowballFilter(
   new SynonymFilter(
        new StopFilter(
           new LowerCaseFilter(
             new StandardFilter(
               new StandardTokenizer(reader))),
                  StandardAnalyzer.STOP_WORDS),
          engine),"English"
);
return result;
}

Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donnagresh
[EMAIL PROTECTED]

Reply via email to