I think you'll have to implement your own Analyzer and count. That is, every call to next() that returns a token will have to also increment some counter by 1.
To use this, you must have some way of knowing when a page ends, and at that point you call your instance of your custom analyzer to see what the count is. Or your analyzer maintains the list and you can call for it after you've added all the pages. Analyzer.getPositionIncrementGap is called every time you call document.add("field"..... So, you have something like this while (more pages for doc) { string pagedata = getPageText(); doc.add("text", pagedata); } Under the covers, your custom analyzer adds the current offset (which you've kept track of) to, say, an ArrayList. And after the last page is added, you get this arraylist and add it to your document. Or, you could just do things twice. That is, send your text through a TokenStream, then call next() and count. Then send it all through doc.add(). There are probably cleverer ways, but that should do for a start. Best Erick On Jan 24, 2008 2:33 PM, <[EMAIL PROTECTED]> wrote: > > -----Original Message----- > > From: Erick Erickson [mailto:[EMAIL PROTECTED] > > Sent: Freitag, 11. Januar 2008 16:16 > > To: java-user@lucene.apache.org > > Subject: Re: Design questions > > > But you could also vary this scheme by simply storing in your document > > the offsets for the beginning of each page. > > Well, this is the best for my app I think, but... > > How do I find out these offsets? > > I'm adding the content field with: > > IndexWriter#add(new Field("content", myContentReader)); > > I have no clue how find out the offsets in this reader. Must be something > with an analyzer and a TokenStream? > > Thank you > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >