Hi Lucene experts,
The following is a simple Lucene code which generates
StringIndexOutOfBoundsException exception. I am using Lucene 2.2.0 official
releasse. Can anyone tell me what is wrong with this code? Is this a bug or
a feature of Lucene? Any comments/hits highly welcommed!
In a nutshell I have a document with two (or four) fileds:
1) all
2-4) small
I use [all] for searching and [small] for highlighting.
[packkage and imports truncated...]
public class MemoryIndexCase {
static public void main(String[] arg) {
Document doc = new Document();
doc.add(new Field("all","example long text",
Field.Store.NO, Field.Index.TOKENIZED));
doc.add(new Field("small","example",
Field.Store.YES, Field.Index.UN_TOKENIZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.add(new Field("small","long",
Field.Store.YES, Field.Index.UN_TOKENIZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.add(new Field("small","text",
Field.Store.YES, Field.Index.UN_TOKENIZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
try {
Directory idx = new RAMDirectory();
IndexWriter writer = new IndexWriter(idx, new
StandardAnalyzer(), true);
writer.addDocument(doc);
writer.optimize();
writer.close();
Searcher searcher = new IndexSearcher(idx);
QueryParser qp = new QueryParser("all", new StandardAnalyzer());
Query query = qp.parse("example text");
Hits hits = searcher.search(query);
Highlighter highlighter = new Highlighter(new
QueryScorer(query));
IndexReader ir = IndexReader.open(idx);
for (int i = 0; i < hits.length(); i++) {
String text = hits.doc(i).get("small");
TermFreqVector tfv = ir.getTermFreqVector(hits.id(i),
"small");
TokenStream tokenStream=
TokenSources.getTokenStream((TermPositionVector)
tfv);
String result =
highlighter.getBestFragment(tokenStream,text);
System.out.println(result);
}
} catch (Throwable e) {
e.printStackTrace();
}
}
}
The exception is:
java.lang.StringIndexOutOfBoundsException: String index out of range: 11
at java.lang.String.substring(String.java:1935)
at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(
Highlighter.java:235)
at org.apache.lucene.search.highlight.Highlighter.getBestFragments(
Highlighter.java:175)
at org.apache.lucene.search.highlight.Highlighter.getBestFragment(
Highlighter.java:101)
at org.lucenetest.MemoryIndexCase.main(MemoryIndexCase.java:70)
Best regards,
Lukas