On Mon, Mar 15, 2010 at 10:57:28PM -0500, Peter Karman wrote:
> I'd like to offer a proximity query type in my app, so that I can search like:
> 
>  foo NEAR10 bar
> 
> to find all instances of 'foo' within 10 token positions of 'bar'.[0]
> 
> It seems like the place to start, if I were to take the route of
> subclassing/extending an existing class, is the PhraseQuery feature,
> specifically the PhraseScorer and the internal winnow_anchors() function. Am I
> on the right track here?

As you seem to have noted already, the hard part will be the Matcher class,
not the Query.

Within the existing KS code base, PhraseScorer would be the closest thing to
what you want.  It wasn't really built to handle nearness, but maybe it can be
adapted.

If you want to see other prior art, Lucene has SpanNearQuery
and SpanScorer:

http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/search/spans/SpanNearQuery.java

http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/search/spans/SpanScorer.java

Also, Lucene's PhraseScorer takes a "slop" parameter, which KinoSearch's does
not.  I forget exactly what it does and how it differs from
SpanNearQuery/SpanScorer.

http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/search/PhraseScorer.java

> [0] I believe Lucene syntax for that query is "foo bar"~10

Yes.  

http://lucene.apache.org/java/3_0_1/queryparsersyntax.html#Proximity%20Searches

That '10' is the 'slop' parameter.

Do you have an idea yet as to how you might publish this?

Marvin Humphrey

Reply via email to