I'm am going to try and write up some more info for you tomorrow, but just to point out: I do think there is a bug in the way offsets are being handled. I don't think this is causing your current problem (what I mentioned is) but it will prob cause you problems down the road. I will look into this further.

- Mark

Lukas Vlcek wrote:
Hi Lucene experts,

The following is a simple Lucene code which generates
StringIndexOutOfBoundsException exception. I am using Lucene 2.2.0 official
releasse. Can anyone tell me what is wrong with this code? Is this a bug or
a feature of Lucene? Any comments/hits highly welcommed!

In a nutshell I have a document with two (or four) fileds:
1) all
2-4) small

I use [all] for searching and [small] for highlighting.

[packkage and imports truncated...]

public class MemoryIndexCase {
    static public void main(String[] arg) {

        Document doc = new Document();

        doc.add(new Field("all","example long text",
                Field.Store.NO, Field.Index.TOKENIZED));
        doc.add(new Field("small","example",
                Field.Store.YES, Field.Index.UN_TOKENIZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
        doc.add(new Field("small","long",
                Field.Store.YES, Field.Index.UN_TOKENIZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
        doc.add(new Field("small","text",
                Field.Store.YES, Field.Index.UN_TOKENIZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));

        try {
            Directory idx = new RAMDirectory();
            IndexWriter writer = new IndexWriter(idx, new
StandardAnalyzer(), true);

            writer.addDocument(doc);
            writer.optimize();
            writer.close();

            Searcher searcher = new IndexSearcher(idx);

            QueryParser qp = new QueryParser("all", new StandardAnalyzer());
            Query query = qp.parse("example text");
            Hits hits = searcher.search(query);

            Highlighter highlighter =    new Highlighter(new
QueryScorer(query));

            IndexReader ir = IndexReader.open(idx);
            for (int i = 0; i < hits.length(); i++) {

                String text = hits.doc(i).get("small");

                TermFreqVector tfv = ir.getTermFreqVector(hits.id(i),
"small");
                TokenStream tokenStream=
TokenSources.getTokenStream((TermPositionVector)
tfv);

                String result =
                    highlighter.getBestFragment(tokenStream,text);
                System.out.println(result);
            }

        } catch (Throwable e) {
            e.printStackTrace();
        }
    }
}

The exception is:
java.lang.StringIndexOutOfBoundsException: String index out of range: 11
    at java.lang.String.substring(String.java:1935)
    at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(
Highlighter.java:235)
    at org.apache.lucene.search.highlight.Highlighter.getBestFragments(
Highlighter.java:175)
    at org.apache.lucene.search.highlight.Highlighter.getBestFragment(
Highlighter.java:101)
    at org.lucenetest.MemoryIndexCase.main(MemoryIndexCase.java:70)

Best regards,
Lukas


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to