As luck would have it, I've done something very similar. What I had to do is index a special token at the end of each page. Then I could get the term offsets for each page....
Then I used one of the SpanQuery.getSpans to get all of the offsets of the hits throughout all of the pages. now I have a list of all the offsets of the *last* term on each page and a list of the offsets of the hits. From these two lists I can know which pages have hits. Best Erick On 5/23/07, Andreas Guther <[EMAIL PROTECTED]> wrote:
Hi, If a search returns a document that has multiple fields with the same name, is there a way to filter only those fields that contain hits? Background: I am indexing documents and we store all content in our index for display reasons. We want to show only those pages containing hits. My first implementation was saving each page in a Lucene document. For performance reasons why are now looking into indexing the complete indexed document as a single Lucene document. Every page is added to a field in the Lucene document named page-content. That means I am ending with as many fields named page-content as the document has pages. My search now returns me a single Lucene document in contrary to my first approach with page per Lucene document. My problem right now is: how can I limit the returned page-contents fields for pages to those field entries that contain hits. If I have hits on pages five pages from a document with 10 pages I would like to have only the pages with the hits, not all. Is there anything in Lucene that limits the returned fields to fields with hits only? Thanks in advance, Andreas --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
