trying to use the highlighter

Paul Libbrecht Fri, 03 Sep 2010 05:06:49 -0700

Hello list,

I'm strugging again with the highlighter. I don't understand why I obtain 
sporadically InvalidTokenOffsetsException.


The mission: given a query, detect which field was matched, among the names of 
the concepts: there can be several names for a given concept, also in one 
language. Concepts are documents and names are in fields name-xx where xx is 
the two-letter-language.

Here's the method I'm using:

    public String computeMatchedField(int docNum, Document doc, Analyzer 
analyzer, Query query) throws IOException {
        //System.out.println("----- computing matched field for query " + query 
+ " on document " + doc.get("uri"));
        query = query.rewrite(this.reader);
        String found = null;
        float maxScore = 0;
        try {
            for(Field f: (List<Field>) doc.getFields()) {
                QueryScorer scorer = new QueryScorer(query,reader,f.name());
                if(!f.name().startsWith("name-")) continue;
                //System.out.println("Measuring field " + f.name() + ": " + 
f.stringValue());
                String text = f.stringValue();
                TokenStream tokenStream = 
TokenSources.getAnyTokenStream(reader,docNum, f.name(), doc, analyzer);
                SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
                Highlighter highlighter = new Highlighter(htmlFormatter, 
scorer);
                TextFragment[] frags = 
highlighter.getBestTextFragments(tokenStream, text, false, 1);
                if(frags==null || frags.length==0) continue;
                float score = frags[0].getScore();
                //System.out.println("Score: " + score);
                if(score > maxScore) {
                    maxScore = score;
                    found = frags[0].toString();
                }
            }
        } catch(Exception ex) {ex.printStackTrace();}
        return found;
    }

Unfortunately, I have to catch InvalidTokenOffsetsException which does happen 
sometimes, not always.
When it occurs, it stops the highlighting (the detected field is "null") and 
also costs quite some time.

What am I doing wrong?
I tried making my own tokenStream with no difference.

thanks in advance

paul


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

trying to use the highlighter

Reply via email to