using lucene to find neighbouring points in an n-dimensional space

prasenjit mukherjee Sat, 22 Oct 2011 09:47:26 -0700

My use case is the following :
Given an n-dimensional vector ( only +ve quadrants/points ) find its
closest neighbours. I would like to try out with lucene's default
ranking. Here is how a typical document will look like :
<term-id:term-weight> ( or <dimension-id:dimension:weight> same thing
)


doc1 = 1245:15 3490:20 8856:20 etc.

As reflected in the above example the number of dimensions is high ( ~
50K ) and the length of vectors are small ( < 40 ).

I am thinking of constructing a  BooleanQuery in the following way (
for doc1 as Query ) :

BooleanQuery bq = new BooleanQuery()
bq.add (new TermQuery(new Term("field", "1245") ),
BooleanClause.Occur.SHOULD ) ;
bq.add (new TermQuery(new Term("field", "3490") ),
BooleanClause.Occur.SHOULD ) ;
bq.add (new TermQuery(new Term("field", "8856") ),
BooleanClause.Occur.SHOULD ) ;

The problem is how do I pass the dimension-value ( 15, 20, 20 etc. )
in the TermQuery.

One solution is to pass as many TermQueries as the diemension value,
but was thinking if there is any better way to pass the
dimension-weight. I can probably do the same during indexing as
latency is not an issue during indexing time.

Any help is greatly appreciated.

-Thanks,
Prasenjit

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

using lucene to find neighbouring points in an n-dimensional space

Reply via email to