As luck would have it, I've done something very similar. What I had
to do is index a special token at the end of each page. Then I could
get the term offsets for each page....

Then I used one of the SpanQuery.getSpans to get all of the
offsets of the hits throughout all of the pages.

now I have a list of all the offsets of the *last* term on each
page and a list of the offsets of the hits. From these two
lists I can know which pages have hits.


Best
Erick

On 5/23/07, Andreas Guther <[EMAIL PROTECTED]> wrote:

Hi,

If a search returns a document that has multiple fields with the same
name, is there a way to filter only those fields that contain hits?


Background:

I am indexing documents and we store all content in our index for
display reasons.  We want to show only those pages containing hits.  My
first implementation was saving each page in a Lucene document.  For
performance reasons why are now looking into indexing the complete
indexed document as a single Lucene document.

Every page is added to a field in the Lucene document named
page-content.  That means I am ending with as many fields named
page-content as the document has pages.

My search now returns me a single Lucene document in contrary to my
first approach with page per Lucene document.  My problem right now is:
how can I limit the returned page-contents fields for pages to those
field entries that contain hits.  If I have hits on pages five pages
from a document with 10 pages I would like to have only the pages with
the hits, not all.

Is there anything in Lucene that limits the returned fields to fields
with hits only?

Thanks in advance,

Andreas



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to