Two things to watch...

1> Think about indexing the special page-end token with an
increment gap of 0 (see SynonymAnalyzer in Lucene In
Action). That preserves the sense of phrases across
page breaks.

2> Assembling the span query is tricky. Search the mail archive
for SpanQuery to see an exchange I had with the originator of
this concept. Suffice it to say that converting an ad-hoc query
into a set of SpanQueries is not trivial, but it certainly is do-able.
But you'd have a much easier time of it if you were able to
control the queries and dis-allow ad-hoc queries. It all depends
upon the requirements of the application. Any time you can
avoid supporting arbitrary boolean logic for the user input, your
job is easier <G>....

But you should be able to run up a demo with simple queries that
you control to prove out the methodology in any case.....

Best
Erick


On 5/23/07, Andreas Guther <[EMAIL PROTECTED]> wrote:

Eric,

Thank you very much for your response.  That sounds very interesting.
Let me do some experimenting to see if I fully understood your solution.
Otherwise I have to come back to you with more questions.

Andreas




-----Original Message-----
From: Erick Erickson [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 23, 2007 12:00 PM
To: java-user@lucene.apache.org
Subject: Re: How to filter fields with hits from result set

As luck would have it, I've done something very similar. What I had
to do is index a special token at the end of each page. Then I could
get the term offsets for each page....

Then I used one of the SpanQuery.getSpans to get all of the
offsets of the hits throughout all of the pages.

now I have a list of all the offsets of the *last* term on each
page and a list of the offsets of the hits. From these two
lists I can know which pages have hits.


Best
Erick

On 5/23/07, Andreas Guther <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> If a search returns a document that has multiple fields with the same
> name, is there a way to filter only those fields that contain hits?
>
>
> Background:
>
> I am indexing documents and we store all content in our index for
> display reasons.  We want to show only those pages containing hits.
My
> first implementation was saving each page in a Lucene document.  For
> performance reasons why are now looking into indexing the complete
> indexed document as a single Lucene document.
>
> Every page is added to a field in the Lucene document named
> page-content.  That means I am ending with as many fields named
> page-content as the document has pages.
>
> My search now returns me a single Lucene document in contrary to my
> first approach with page per Lucene document.  My problem right now
is:
> how can I limit the returned page-contents fields for pages to those
> field entries that contain hits.  If I have hits on pages five pages
> from a document with 10 pages I would like to have only the pages with
> the hits, not all.
>
> Is there anything in Lucene that limits the returned fields to fields
> with hits only?
>
> Thanks in advance,
>
> Andreas
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to