Mark,
thank you for this. I will wait for your other responses.
This will keep me going on :-)
I didn't know that there is a design restriction in Lucene that the text and
TokenStream must be exactly the same (still this seems redundant, I will
dive into Lucene API more).
BR
Lukas
On 7/29/07, Mark Miller <[EMAIL PROTECTED]> wrote:
>
> I'm am going to try and write up some more info for you tomorrow, but
> just to point out: I do think there is a bug in the way offsets are
> being handled. I don't think this is causing your current problem (what
> I mentioned is) but it will prob cause you problems down the road. I
> will look into this further.
>
> - Mark
>
> Lukas Vlcek wrote:
> > Hi Lucene experts,
> >
> > The following is a simple Lucene code which generates
> > StringIndexOutOfBoundsException exception. I am using Lucene 2.2.0official
> > releasse. Can anyone tell me what is wrong with this code? Is this a bug
> or
> > a feature of Lucene? Any comments/hits highly welcommed!
> >
> > In a nutshell I have a document with two (or four) fileds:
> > 1) all
> > 2-4) small
> >
> > I use [all] for searching and [small] for highlighting.
> >
> > [packkage and imports truncated...]
> >
> > public class MemoryIndexCase {
> > static public void main(String[] arg) {
> >
> > Document doc = new Document();
> >
> > doc.add(new Field("all","example long text",
> > Field.Store.NO, Field.Index.TOKENIZED));
> > doc.add(new Field("small","example",
> > Field.Store.YES, Field.Index.UN_TOKENIZED,
> > Field.TermVector.WITH_POSITIONS_OFFSETS));
> > doc.add(new Field("small","long",
> > Field.Store.YES, Field.Index.UN_TOKENIZED,
> > Field.TermVector.WITH_POSITIONS_OFFSETS));
> > doc.add(new Field("small","text",
> > Field.Store.YES, Field.Index.UN_TOKENIZED,
> > Field.TermVector.WITH_POSITIONS_OFFSETS));
> >
> > try {
> > Directory idx = new RAMDirectory();
> > IndexWriter writer = new IndexWriter(idx, new
> > StandardAnalyzer(), true);
> >
> > writer.addDocument(doc);
> > writer.optimize();
> > writer.close();
> >
> > Searcher searcher = new IndexSearcher(idx);
> >
> > QueryParser qp = new QueryParser("all", new
> StandardAnalyzer());
> > Query query = qp.parse("example text");
> > Hits hits = searcher.search(query);
> >
> > Highlighter highlighter = new Highlighter(new
> > QueryScorer(query));
> >
> > IndexReader ir = IndexReader.open(idx);
> > for (int i = 0; i < hits.length(); i++) {
> >
> > String text = hits.doc(i).get("small");
> >
> > TermFreqVector tfv = ir.getTermFreqVector(hits.id(i),
> > "small");
> > TokenStream tokenStream=
> > TokenSources.getTokenStream((TermPositionVector)
> > tfv);
> >
> > String result =
> > highlighter.getBestFragment(tokenStream,text);
> > System.out.println(result);
> > }
> >
> > } catch (Throwable e) {
> > e.printStackTrace();
> > }
> > }
> > }
> >
> > The exception is:
> > java.lang.StringIndexOutOfBoundsException: String index out of range: 11
> > at java.lang.String.substring(String.java:1935)
> > at
> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(
> > Highlighter.java:235)
> > at org.apache.lucene.search.highlight.Highlighter.getBestFragments(
> > Highlighter.java:175)
> > at org.apache.lucene.search.highlight.Highlighter.getBestFragment(
> > Highlighter.java:101)
> > at org.lucenetest.MemoryIndexCase.main(MemoryIndexCase.java:70)
> >
> > Best regards,
> > Lukas
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>