Mark, thank you for this. I will wait for your other responses. This will keep me going on :-)
I didn't know that there is a design restriction in Lucene that the text and TokenStream must be exactly the same (still this seems redundant, I will dive into Lucene API more). BR Lukas On 7/29/07, Mark Miller <[EMAIL PROTECTED]> wrote: > > I'm am going to try and write up some more info for you tomorrow, but > just to point out: I do think there is a bug in the way offsets are > being handled. I don't think this is causing your current problem (what > I mentioned is) but it will prob cause you problems down the road. I > will look into this further. > > - Mark > > Lukas Vlcek wrote: > > Hi Lucene experts, > > > > The following is a simple Lucene code which generates > > StringIndexOutOfBoundsException exception. I am using Lucene 2.2.0official > > releasse. Can anyone tell me what is wrong with this code? Is this a bug > or > > a feature of Lucene? Any comments/hits highly welcommed! > > > > In a nutshell I have a document with two (or four) fileds: > > 1) all > > 2-4) small > > > > I use [all] for searching and [small] for highlighting. > > > > [packkage and imports truncated...] > > > > public class MemoryIndexCase { > > static public void main(String[] arg) { > > > > Document doc = new Document(); > > > > doc.add(new Field("all","example long text", > > Field.Store.NO, Field.Index.TOKENIZED)); > > doc.add(new Field("small","example", > > Field.Store.YES, Field.Index.UN_TOKENIZED, > > Field.TermVector.WITH_POSITIONS_OFFSETS)); > > doc.add(new Field("small","long", > > Field.Store.YES, Field.Index.UN_TOKENIZED, > > Field.TermVector.WITH_POSITIONS_OFFSETS)); > > doc.add(new Field("small","text", > > Field.Store.YES, Field.Index.UN_TOKENIZED, > > Field.TermVector.WITH_POSITIONS_OFFSETS)); > > > > try { > > Directory idx = new RAMDirectory(); > > IndexWriter writer = new IndexWriter(idx, new > > StandardAnalyzer(), true); > > > > writer.addDocument(doc); > > writer.optimize(); > > writer.close(); > > > > Searcher searcher = new IndexSearcher(idx); > > > > QueryParser qp = new QueryParser("all", new > StandardAnalyzer()); > > Query query = qp.parse("example text"); > > Hits hits = searcher.search(query); > > > > Highlighter highlighter = new Highlighter(new > > QueryScorer(query)); > > > > IndexReader ir = IndexReader.open(idx); > > for (int i = 0; i < hits.length(); i++) { > > > > String text = hits.doc(i).get("small"); > > > > TermFreqVector tfv = ir.getTermFreqVector(hits.id(i), > > "small"); > > TokenStream tokenStream= > > TokenSources.getTokenStream((TermPositionVector) > > tfv); > > > > String result = > > highlighter.getBestFragment(tokenStream,text); > > System.out.println(result); > > } > > > > } catch (Throwable e) { > > e.printStackTrace(); > > } > > } > > } > > > > The exception is: > > java.lang.StringIndexOutOfBoundsException: String index out of range: 11 > > at java.lang.String.substring(String.java:1935) > > at > org.apache.lucene.search.highlight.Highlighter.getBestTextFragments( > > Highlighter.java:235) > > at org.apache.lucene.search.highlight.Highlighter.getBestFragments( > > Highlighter.java:175) > > at org.apache.lucene.search.highlight.Highlighter.getBestFragment( > > Highlighter.java:101) > > at org.lucenetest.MemoryIndexCase.main(MemoryIndexCase.java:70) > > > > Best regards, > > Lukas > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >