I am using Lucene 4.0 and trying to use its highlighting feature. I am not
getting the desired result due to some mistake that I am not able to
identify. My source code looks like
String sourceText = "liver disease kidney transplant";
String termString ="\"transplant\"";
SimpleAnalyzer simpleAnalyzer = new SimpleAnalyzer(Version.LUCENE_40);
Query query = new QueryParser(Version.LUCENE_40,"contents",
simpleAnalyzer).parse(termString);
TokenStream tokenStream = simpleAnalyzer.tokenStream("contents", new
StringReader(sourceText));
QueryScorer scorer = new QueryScorer(query,"contents");
scorer.setExpandMultiTermQuery(true);
Fragmenter fragmenter = new SimpleSpanFragmenter(scorer);
SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter( "*",
"*") ;
Highlighter highlighter = new Highlighter(simpleHTMLFormatter, scorer );
highlighter.setTextFragmenter(fragmenter);
highlighter.setMaxDocCharsToAnalyze(10000);
String resultString =
highlighter.getBestFragments(tokenStream,sourceText,1000, "...");
System.out.println("Source Text1 = "+sourceText);
System.out.println("Result Text1 = "+resultString);
sourceText = "for liver transplantation.";
tokenStream = simpleAnalyzer.tokenStream("contents", new
StringReader(sourceText));
resultString = highlighter.getBestFragments(tokenStream,sourceText,1000,
"...");
System.out.println("Source Text2 = "+sourceText);
System.out.println("Result Text2 = "+resultString);
For the first text, I am getting the result properly but not for the second
one
Source Text1 = liver disease kidney transplant
Result Text1 = liver disease kidney *transplant*
Source Text2 = for liver transplantation.
Result Text2 =
I am expecting the result for second one like
for liver *transplant*ation
or
for liver *transplantation*
What is wrong in my code?
--
View this message in context:
http://lucene.472066.n3.nabble.com/highlighting-tp542569p3178841.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]