You're right, those cases won't be covered, and probably can't be without some hacking at the NearSpans* classes. The other niggle I've found is that it doesn't play well with highlighting - you get the entire span highlighted, rather than the individual terms within it.
For NOT WITHIN queries, I use the following: X NOT WITHIN/5 Y -> SpanNotQuery(X, SpanNear(X, Y, 5)) which finds all instances of X, and then removes any that are also within 5 of Y. On 17 May 2012, at 20:02, Chris Harris wrote: > First impression is, that's a reasonably clever way to get the user > intent basically right without having to add a new SpanQuery. Have you > come up with any edge cases where it could do something unexpected? > > So far I've thought of one, though you could argue it has more to do > with the "minimum/lazy/nonoverlapping match" nature of SpanQuery than > with your particular implementation of "and": Suppose there's a > document whose complete text is > > B A x A x x x x C > > From my hypothetical user's perspective, this should match the query > [A w/5 (B and C)], because the second "A" is within slop 5 of both B > and C. However, because SpanNear only does minimum-ish matches, this > document *won't* match the rewritten query SpanNear(A, spanNear(A, B, > 5), spanNear(A, C, 5), 0); the only span generated for the SpanNear(A, > B, 5) subquery will be "B A", and the only span for SpanNear(A, C, 5) > will be "A x x x x C", and those two are not adjacent, so there's no > match for the outer SpanNear. > > Also, while we're exploring your solution, do you also have a rule to > cover "not"? > > On Thu, May 17, 2012 at 12:58 AM, Alan Woodward > <[email protected]> wrote: >> I've just had to implement exactly this - the solution I came up with was to >> translate: >> >> A w/5 (B and C) -> SpanNear(A, spanNear(A, B, 5), spanNear(A, C, 5), 0) >> A w/5 (B or C) -> OR(spanNear(A, B, 5), spanNear(A, C, 5)) >> >> More complex queries (such as (A AND B) w/5 (C AND D)) are dealt with by >> applying the above rules recursively. You do end up with some horribly >> overcomplicated queries, but it seems to be performant enough. >> >> >> >> On 17 May 2012, at 04:38, Mike Sokolov wrote: >> >>> It sounds me as if there could be a market for a new kind of query that >>> would implement: >>> >>> A w/5 (B and C) >>> >>> in the way that people understand it to mean - the same A near both B and >>> C, not just any A. >>> >>> Maybe it's too hard to implement using rewrites into existing SpanQueries? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
