Hi,

I might be missing something. I have a custom analyzer the gist of which is:

        public TokenStream tokenStream(String fieldName, Reader reader)
        {
                TokenStream result = new StandardTokenizer(reader);
                result = new StandardFilter(result);
                result = new LowerCaseFilter(result);
                result = new StopFilter(result, stopSet);
                result = new PorterStemFilter(result);
                return result;
        }

I test my above analyzer with the following query string:
the is EOS-20D canon amazing

In my test code I do this  to see what my analyzed query string looks like:
        
                PerFieldAnalyzerWrapper analyzer = new 
PerFieldAnalyzerWrapper(new StandardStemmingAnalyzer());
                analyzer.addAnalyzer("categoryNames", new KeywordAnalyzer());

                TokenStream stream = analyzer.tokenStream(null, new 
StringReader(queryString));
                String analyzedQueryString = "";
                
                while(true)
                {
                        Token token = stream.next();
                        if(token == null)
                        {
                                break;
                        }
                
                        analyzedQueryString = analyzedQueryString + 
token.termText() + " ";
                }
                
                analyzedQueryString = analyzedQueryString.trim();

                log.debug("analyzedQueryString = " + analyzedQueryString);

The output of the log statement above is:

analyzedQueryString = eos-20d canon amaz

I see that the common stop words have been removed, everything has been lower 
cased and even the query has also been stemmed, why was the hyphen not removed 
by the standard filter??? Or does the standard analyzer remove hyphens only 
from phrases like "eos - 20d" and not from "eos-20d" ?

Thanks.

Reply via email to