Re: Sentence and Paragraph searching

Paul Elschot Fri, 01 Jul 2005 13:06:41 -0700

On Friday 01 July 2005 20:52, McCallie,David wrote:
> 
> Couldn't you use SpanQuery for something like this?  Put special
> <start-of-sentence> and <end-of-sentence> tokens around each sentence,
> and then search for the specific key words inside of the outer SPAN? Do
> the same for paragraphs, sections, etc.
> 
> I tried this once, and it seemed to work.  I'm not sure of the
> performance penalty of the SPAN overhead.
>


It should work, as well as SpanNotQuery for excluding the
sentence boundary (see my other post). Using a separate
sentence field in which each token position is mapped to the same
sentence number would be faster, but that would also require
a special version of PhraseQuery to search at the same position.
Paragraphs can be handled similarly.

The disadvantage of adding a new field over the same data
is that the term index is duplicated.
This could be avoided by extending the index format
with index levels: one for normal use, one for sentences, one for
paragraphs, ... .

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Sentence and Paragraph searching

Reply via email to