Hello list,
I'm strugging again with the highlighter. I don't understand why I obtain
sporadically InvalidTokenOffsetsException.
The mission: given a query, detect which field was matched, among the names of
the concepts: there can be several names for a given concept, also in one
language. Concepts are documents and names are in fields name-xx where xx is
the two-letter-language.
Here's the method I'm using:
public String computeMatchedField(int docNum, Document doc, Analyzer
analyzer, Query query) throws IOException {
//System.out.println("----- computing matched field for query " + query
+ " on document " + doc.get("uri"));
query = query.rewrite(this.reader);
String found = null;
float maxScore = 0;
try {
for(Field f: (List<Field>) doc.getFields()) {
QueryScorer scorer = new QueryScorer(query,reader,f.name());
if(!f.name().startsWith("name-")) continue;
//System.out.println("Measuring field " + f.name() + ": " +
f.stringValue());
String text = f.stringValue();
TokenStream tokenStream =
TokenSources.getAnyTokenStream(reader,docNum, f.name(), doc, analyzer);
SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
Highlighter highlighter = new Highlighter(htmlFormatter,
scorer);
TextFragment[] frags =
highlighter.getBestTextFragments(tokenStream, text, false, 1);
if(frags==null || frags.length==0) continue;
float score = frags[0].getScore();
//System.out.println("Score: " + score);
if(score > maxScore) {
maxScore = score;
found = frags[0].toString();
}
}
} catch(Exception ex) {ex.printStackTrace();}
return found;
}
Unfortunately, I have to catch InvalidTokenOffsetsException which does happen
sometimes, not always.
When it occurs, it stops the highlighting (the detected field is "null") and
also costs quite some time.
What am I doing wrong?
I tried making my own tokenStream with no difference.
thanks in advance
paul
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]