Re: Search within a sentence (revisited)

darren Wed, 20 Jul 2011 08:33:46 -0700

I just parse the text into sentences and put those in a multi-valued field
and then search that.


On Wed, 20 Jul 2011 11:27:38 -0400, Peter Keegan <peterlkee...@gmail.com>
wrote:
> I have browsed many suggestions on how to implement 'search within a
> sentence', but all seem to have drawbacks. For example, from
>
http://lucene.472066.n3.nabble.com/Issue-with-sentence-specific-search-td1644352.html#a1645072
> 
> Steve Rowe writes:
> 
> ----------
> One common technique, instead of using a larger-than-normal position
> increment gap between sentences, is using a sentence boundary token like
> '$'
> or something else that won't ever itself be the target of search. 
Quoting
> from a post Mark Miller made to the lucene-user list last year <
>
http://www.lucidimagination.com/search/document/c9641cbb1a3bf928/multiline_regex_with_lucene
>>):
> 
>         First you inject special marker tokens as your paragraph/
>         sentence markers, then you use a SpanNotQuery that looks
>         for a SpanNearQuery that doesn't intersect with a
>         SpanTermQuery containing the special marker term.
> 
> Mark's suggestion would work for your within-sentence case, and for the
> case
> where you don't care about sentence boundaries, you can use
SpanNearQuery
> without the SpanNotQuery.
> ----------
> 
> The problem with the last part is that the SpanNearQuery would have to
have
> a slop of 1 in order to accomodate the marker token between sentences.
This
> could result in incorrect matches if the a slop of 0 is intended.
Another
> suggestion was to overlap the marker token with the first or last token
of
> the sentence, but the SpanNotQuery would always exclude any terms in the
> query that are at the intersection.  Mark Miller's 'SpanWithinQuery'
patch
> seems to have the same issue.
> 
> Has anyone implemented a solution that works for both in-sentence and
> across
> sentence boundaries?
> 
> Thanks,
> Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Search within a sentence (revisited)

Reply via email to