That's the *StandartTokenizer*, which is not at all identical to StandardAnalyzer. From the Javadoc for StandardAnalyzer: Filters StandardTokenizer<http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/standard/StandardTokenizer.html> with StandardFilter<http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/standard/StandardFilter.html> , LowerCaseFilter<http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/LowerCaseFilter.html> and StopFilter<http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/StopFilter.html>, using a list of English stop words.
So your case issue is entirely explained. You have no reason at all to expect stemming to work unless you put a stemmer in the chain.... HTH Erick On Thu, Apr 29, 2010 at 5:10 PM, Justin <cry...@yahoo.com> wrote: > I'm using my own analyzer so I can interject HTMLStripCharFilter as > described in a previous thread. > > private static Analyzer htmlStripAnalyzer = new ReusableAnalyzerBase() { > @Override > protected TokenStreamComponents createComponents( > final String fieldName, final Reader reader) { > return new TokenStreamComponents(new > StandardTokenizer(Version.LUCENE_30, > new HTMLStripCharFilter(CharReader.get(reader)))); > } > }; > > So should I replace StandardTokenizer with LowerCaseTokenizer above? Then > would I override nextToken() and stem or is there a better way to use > something like SnowballFilter? > > > > > ----- Original Message ---- > From: Erick Erickson <erickerick...@gmail.com> > To: java-user@lucene.apache.org > Sent: Thu, April 29, 2010 3:30:09 PM > Subject: Re: Highlighter usage > > What analyzer are you using at index time? My guess is something > like WhitespaceAnalyzer that doesn't stem or change case..... Try > a different analyzer, SimpleAnalyzer comes to mind.... > > HTH > Erick > > On Thu, Apr 29, 2010 at 4:21 PM, Justin <cry...@yahoo.com> wrote: > > > I'm trying to use Highlighter with QueryScorer after reading: > > > > https://issues.apache.org/jira/browse/LUCENE-1685 > > > > The problem is: I'm not getting a result unless my the query term is an > > exact match. Am I missing filters? Is there a more complete example of > how > > this should work? > > > > > > String content = "Global Climate Change affects us all"; > > > > String field = "content"; > > BooleanQuery query = new BooleanQuery(); > > > > // Unstemmed, matched case works > > //query.add(new TermQuery(new Term(field, "Climate")), > > BooleanClause.Occur.MUST); > > > > // Stemmed, lowercase doesn't work > > query.add(new TermQuery(new Term(field, "climat")), > > BooleanClause.Occur.MUST); > > query.add(new TermQuery(new Term(field, "affect")), > > BooleanClause.Occur.MUST); > > > > Highlighter highlighter = new Highlighter(new QueryScorer(query, > > field)); > > highlighter.setMaxDocCharsToAnalyze(500000); > > > > TokenStream ts = htmlStripAnalyzer.tokenStream(field, new > > StringReader(content)); > > ts = new CachingTokenFilter(ts); > > > > System.out.println(highlighter.getBestFragment(ts, content)); > > > > > > Thanks for any feedback, > > Justin > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >