Hello, I have a problem when using n-gram and highlighter. I thought it had been solved on the ticket:
http://issues.apache.org/jira/browse/LUCENE-627 Actually, I found this problem when I was using CJKTokenizer on Solr, though, here is lucene program to reproduce it using NGramTokenizer(min=2,max=2) instead of CJKTokenizer: public class TestNGramHighlighter { public static void main(String[] args) throws Exception { Analyzer analyzer = new NGramAnalyzer(); final String TEXT = "ABCDEFGHIJKLMNABCDEFGHIJKLMN"; final String QUERY = "GHI"; QueryParser parser = new QueryParser("f",analyzer); Query query = parser.parse(QUERY); QueryScorer scorer = new QueryScorer(query,"f"); Highlighter h = new Highlighter( scorer ); System.out.println( h.getBestFragment(analyzer, "f", TEXT) ); } static class NGramAnalyzer extends Analyzer { public TokenStream tokenStream(String field, Reader input) { return new NGramTokenizer(input,2,2); } } } expected output is: ABCDEF<B>GHI</B>JKLMNABCDEF<B>GHI</B>JKLMN but the actual output is: ABCDEF<B>GHIJKLMNABCDEFGHI</B>JKLMN Am I missing something? Thank you, Koji --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]