On Sun, Mar 21, 2010 at 08:50:10PM -0500, Peter Karman wrote: > > The current implementation has a limitation I think is probably pretty > > important: 'b NEAR a' doesn't return the same result set as 'a NEAR b'. > > As you noted earlier in this thread, there is no concensus about what a > proximity query is. :)
Touché! > I did consider the fact that proximity might imply that order does not matter. > But I came down here: if I want order to matter, and the ProximityScorer > ignores > order as you're suggesting, then I have no options. I can't limit my search to > 'a NEAR b'. > > If instead we leave the ProximityScorer as is, then this: > > (a NEAR b) OR (b NEAR a) > > does what you're describing. Truth. Foolish me didn't realize it had been a conscious choice. > Consider too: > > (a NEAR b NEAR c) > > which might be written as: > > "a b c"~10 > > What order should I consider there? 'a' within 10 positions of 'b' and 'c'? or > 'b' within 10 positions of 'a' and 'c'? or... You see how the possibilities > multiply. OK, I can see how the more limited semantics that you chose for ProximityQuery are actually liberating under many circumsances. We should zap my TODO test. > I think simpler is better here: if you want order to not matter, then OR > together the various orders you might be interested in. In fact, I may offer > that as an option in the Search::Query::Parser, which could then do the ORing > programmatically. Likewise, if we choose to support the "a b"~N syntax in the > KS > QueryParser, could do something similar. I'd rather shunt people who need more than the basic syntax of the core QueryParser towards yours than try to imitate it. :) > > Superficial stylistic suggestion: I might propose "proximity" (or > > "nearness", > > but "proximity" is better) instead of "near" for the name of that parameter. > > Or alternately, "slop", but I understand why you went with nearness instead. > > I like 'proximity' for consistency's sake. And yes, 'near' is not quite right. > How about 'within'? Or 'vicinity'? Those all seem fine to me. I'd cast my vote for "proximity" just because you chose to call the class "ProximityQuery" and an exact name match seems easiest to remember, but "within" is a little easier to spell and just has a slightly more "natural language" linguistic emphasis as opposed to more traditional "noun = value" naming style. Marvin Humphrey
