Hello,

I have a problem when using n-gram and highlighter.
I thought it had been solved on the ticket:

http://issues.apache.org/jira/browse/LUCENE-627

Actually, I found this problem when I was using CJKTokenizer
on Solr, though, here is lucene program to reproduce it
using NGramTokenizer(min=2,max=2) instead of CJKTokenizer:

public class TestNGramHighlighter {
public static void main(String[] args) throws Exception {
Analyzer analyzer = new NGramAnalyzer();
final String TEXT = "ABCDEFGHIJKLMNABCDEFGHIJKLMN";
final String QUERY = "GHI";
QueryParser parser = new QueryParser("f",analyzer);
Query query = parser.parse(QUERY);
QueryScorer scorer = new QueryScorer(query,"f");
Highlighter h = new Highlighter( scorer );
System.out.println( h.getBestFragment(analyzer, "f", TEXT) );
}
static class NGramAnalyzer extends Analyzer {
public TokenStream tokenStream(String field, Reader input) {
return new NGramTokenizer(input,2,2);
}
}
}

expected output is:
ABCDEF<B>GHI</B>JKLMNABCDEF<B>GHI</B>JKLMN

but the actual output is:
ABCDEF<B>GHIJKLMNABCDEFGHI</B>JKLMN

Am I missing something?

Thank you,

Koji


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to