Mark,
thank you for this. I will wait for your other responses.
This will keep me going on :-)

I didn't know that there is a design restriction in Lucene that the text and
TokenStream must be exactly the same (still this seems redundant, I will
dive into Lucene API more).

BR
Lukas

On 7/29/07, Mark Miller <[EMAIL PROTECTED]> wrote:
>
> I'm am going to try and write up some more info for you tomorrow, but
> just to point out: I do think there is a bug in the way offsets are
> being handled. I don't think this is causing your current problem (what
> I mentioned is) but it will prob cause you problems down the road. I
> will look into this further.
>
> - Mark
>
> Lukas Vlcek wrote:
> > Hi Lucene experts,
> >
> > The following is a simple Lucene code which generates
> > StringIndexOutOfBoundsException exception. I am using Lucene 2.2.0official
> > releasse. Can anyone tell me what is wrong with this code? Is this a bug
> or
> > a feature of Lucene? Any comments/hits highly welcommed!
> >
> > In a nutshell I have a document with two (or four) fileds:
> > 1) all
> > 2-4) small
> >
> > I use [all] for searching and [small] for highlighting.
> >
> > [packkage and imports truncated...]
> >
> > public class MemoryIndexCase {
> >     static public void main(String[] arg) {
> >
> >         Document doc = new Document();
> >
> >         doc.add(new Field("all","example long text",
> >                 Field.Store.NO, Field.Index.TOKENIZED));
> >         doc.add(new Field("small","example",
> >                 Field.Store.YES, Field.Index.UN_TOKENIZED,
> > Field.TermVector.WITH_POSITIONS_OFFSETS));
> >         doc.add(new Field("small","long",
> >                 Field.Store.YES, Field.Index.UN_TOKENIZED,
> > Field.TermVector.WITH_POSITIONS_OFFSETS));
> >         doc.add(new Field("small","text",
> >                 Field.Store.YES, Field.Index.UN_TOKENIZED,
> > Field.TermVector.WITH_POSITIONS_OFFSETS));
> >
> >         try {
> >             Directory idx = new RAMDirectory();
> >             IndexWriter writer = new IndexWriter(idx, new
> > StandardAnalyzer(), true);
> >
> >             writer.addDocument(doc);
> >             writer.optimize();
> >             writer.close();
> >
> >             Searcher searcher = new IndexSearcher(idx);
> >
> >             QueryParser qp = new QueryParser("all", new
> StandardAnalyzer());
> >             Query query = qp.parse("example text");
> >             Hits hits = searcher.search(query);
> >
> >             Highlighter highlighter =    new Highlighter(new
> > QueryScorer(query));
> >
> >             IndexReader ir = IndexReader.open(idx);
> >             for (int i = 0; i < hits.length(); i++) {
> >
> >                 String text = hits.doc(i).get("small");
> >
> >                 TermFreqVector tfv = ir.getTermFreqVector(hits.id(i),
> > "small");
> >                 TokenStream tokenStream=
> > TokenSources.getTokenStream((TermPositionVector)
> > tfv);
> >
> >                 String result =
> >                     highlighter.getBestFragment(tokenStream,text);
> >                 System.out.println(result);
> >             }
> >
> >         } catch (Throwable e) {
> >             e.printStackTrace();
> >         }
> >     }
> > }
> >
> > The exception is:
> > java.lang.StringIndexOutOfBoundsException: String index out of range: 11
> >     at java.lang.String.substring(String.java:1935)
> >     at
> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(
> > Highlighter.java:235)
> >     at org.apache.lucene.search.highlight.Highlighter.getBestFragments(
> > Highlighter.java:175)
> >     at org.apache.lucene.search.highlight.Highlighter.getBestFragment(
> > Highlighter.java:101)
> >     at org.lucenetest.MemoryIndexCase.main(MemoryIndexCase.java:70)
> >
> > Best regards,
> > Lukas
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to