Any pointers/suggestions on my approach ?
On 10/22/11, prasenjit mukherjee <prasen....@gmail.com> wrote: > My use case is the following : > Given an n-dimensional vector ( only +ve quadrants/points ) find its > closest neighbours. I would like to try out with lucene's default > ranking. Here is how a typical document will look like : > <term-id:term-weight> ( or <dimension-id:dimension:weight> same thing > ) > > doc1 = 1245:15 3490:20 8856:20 etc. > > As reflected in the above example the number of dimensions is high ( ~ > 50K ) and the length of vectors are small ( < 40 ). > > I am thinking of constructing a BooleanQuery in the following way ( > for doc1 as Query ) : > > BooleanQuery bq = new BooleanQuery() > bq.add (new TermQuery(new Term("field", "1245") ), > BooleanClause.Occur.SHOULD ) ; > bq.add (new TermQuery(new Term("field", "3490") ), > BooleanClause.Occur.SHOULD ) ; > bq.add (new TermQuery(new Term("field", "8856") ), > BooleanClause.Occur.SHOULD ) ; > > The problem is how do I pass the dimension-value ( 15, 20, 20 etc. ) > in the TermQuery. > > One solution is to pass as many TermQueries as the diemension value, > but was thinking if there is any better way to pass the > dimension-weight. I can probably do the same during indexing as > latency is not an issue during indexing time. > > Any help is greatly appreciated. > > -Thanks, > Prasenjit > -- Sent from my mobile device --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org