Thanks for responding. On Fri, Oct 28, 2011 at 1:12 AM, Felipe Hummel <felipehum...@gmail.com> wrote: > For the indexing part, you can 'insert' the term multiple times (term-weight > times) constructing the document String manually. That is not very typical, > you would normally feed Lucene with the original documents for it to parse > and index. > The query processing could be done similar as you said. > > Just be assured that you really want to use Lucene for this. If you already > have the term-vectors maybe you could just implement the closest > neighbours calculation > by yourself. Just compare your target document with every other in the > dataset and rank by similarity.
Main incentive for me to use Lucene/Solr is that it is already being done by Lucene/Solr in a much scalable way. I am assuming there is not much overhead with this approach. -Thanks, Prasenjit > > > Felipe Hummel > > > On Sun, Oct 23, 2011 at 9:33 PM, prasenjit mukherjee > <prasen....@gmail.com>wrote: > >> Any pointers/suggestions on my approach ? >> >> >> On 10/22/11, prasenjit mukherjee <prasen....@gmail.com> wrote: >> > My use case is the following : >> > Given an n-dimensional vector ( only +ve quadrants/points ) find its >> > closest neighbours. I would like to try out with lucene's default >> > ranking. Here is how a typical document will look like : >> > <term-id:term-weight> ( or <dimension-id:dimension:weight> same thing >> > ) >> > >> > doc1 = 1245:15 3490:20 8856:20 etc. >> > >> > As reflected in the above example the number of dimensions is high ( ~ >> > 50K ) and the length of vectors are small ( < 40 ). >> > >> > I am thinking of constructing a  BooleanQuery in the following way ( >> > for doc1 as Query ) : >> > >> > BooleanQuery bq = new BooleanQuery() >> > bq.add (new TermQuery(new Term("field", "1245") ), >> > BooleanClause.Occur.SHOULD ) ; >> > bq.add (new TermQuery(new Term("field", "3490") ), >> > BooleanClause.Occur.SHOULD ) ; >> > bq.add (new TermQuery(new Term("field", "8856") ), >> > BooleanClause.Occur.SHOULD ) ; >> > >> > The problem is how do I pass the dimension-value ( 15, 20, 20 etc. ) >> > in the TermQuery. >> > >> > One solution is to pass as many TermQueries as the diemension value, >> > but was thinking if there is any better way to pass the >> > dimension-weight. I can probably do the same during indexing as >> > latency is not an issue during indexing time. >> > >> > Any help is greatly appreciated. >> > >> > -Thanks, >> > Prasenjit >> > >> >> -- >> Sent from my mobile device >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org