I have browsed many suggestions on how to implement 'search within a sentence', but all seem to have drawbacks. For example, from http://lucene.472066.n3.nabble.com/Issue-with-sentence-specific-search-td1644352.html#a1645072
Steve Rowe writes: ---------- One common technique, instead of using a larger-than-normal position increment gap between sentences, is using a sentence boundary token like '$' or something else that won't ever itself be the target of search. Quoting from a post Mark Miller made to the lucene-user list last year < http://www.lucidimagination.com/search/document/c9641cbb1a3bf928/multiline_regex_with_lucene >): First you inject special marker tokens as your paragraph/ sentence markers, then you use a SpanNotQuery that looks for a SpanNearQuery that doesn't intersect with a SpanTermQuery containing the special marker term. Mark's suggestion would work for your within-sentence case, and for the case where you don't care about sentence boundaries, you can use SpanNearQuery without the SpanNotQuery. ---------- The problem with the last part is that the SpanNearQuery would have to have a slop of 1 in order to accomodate the marker token between sentences. This could result in incorrect matches if the a slop of 0 is intended. Another suggestion was to overlap the marker token with the first or last token of the sentence, but the SpanNotQuery would always exclude any terms in the query that are at the intersection. Mark Miller's 'SpanWithinQuery' patch seems to have the same issue. Has anyone implemented a solution that works for both in-sentence and across sentence boundaries? Thanks, Peter