On Mon, Mar 15, 2010 at 10:57:28PM -0500, Peter Karman wrote: > I'd like to offer a proximity query type in my app, so that I can search like: > > foo NEAR10 bar > > to find all instances of 'foo' within 10 token positions of 'bar'.[0] > > It seems like the place to start, if I were to take the route of > subclassing/extending an existing class, is the PhraseQuery feature, > specifically the PhraseScorer and the internal winnow_anchors() function. Am I > on the right track here?
As you seem to have noted already, the hard part will be the Matcher class, not the Query. Within the existing KS code base, PhraseScorer would be the closest thing to what you want. It wasn't really built to handle nearness, but maybe it can be adapted. If you want to see other prior art, Lucene has SpanNearQuery and SpanScorer: http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/search/spans/SpanNearQuery.java http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/search/spans/SpanScorer.java Also, Lucene's PhraseScorer takes a "slop" parameter, which KinoSearch's does not. I forget exactly what it does and how it differs from SpanNearQuery/SpanScorer. http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/search/PhraseScorer.java > [0] I believe Lucene syntax for that query is "foo bar"~10 Yes. http://lucene.apache.org/java/3_0_1/queryparsersyntax.html#Proximity%20Searches That '10' is the 'slop' parameter. Do you have an idea yet as to how you might publish this? Marvin Humphrey
