Hi Otis, thanks for the answer - I'm aware of Solr, but it seems this is - according to its abstraction level - too generalized for us. Solr seems to be nice in the case you want to use the black box, and won't be aware of 'what is under the hood'. But maybe I'm totaly wrong. At least, it would be from interest how Solr realizes its distributed search, in the case it makes something different than using the core-Lucene ParallelMultiSearcher with RemoteSearchables. Maybe on this list somebody knows the answer.
On Tue, 4 Aug 2009 07:20:23 -0700 (PDT) Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote: > Hi Christian, > > You didn't mention Solr, so I'm not sure if you are aware of it. Maybe Solr > meets your needs? > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > ----- Original Message ---- > > From: Christian Reuschling <christian.reuschl...@gmail.com> > > To: java-user@lucene.apache.org > > Sent: Tuesday, August 4, 2009 5:50:16 AM > > Subject: ParallelMultiSearcher and idf > > > > Hello, > > > > when searching over multiple indices, we create one IndexReader for each > > index, and wrap them into a MultiReader, that we use for IndexSearcher > > creation. > > > > This is fine for searching multiple indices on one machine, but in the case > > the indices are distributed over the (intra)net, this scenario has several > > lacks: > > > > - searching/scoring/sorting is 100% on the client machine, so you need all > > the ram and cpu power at every client. > > - all the data necessary for scoring must go over the net - so the traffic > > should be significantly higher > > - thus, there is a lack of overall performance > > > > Nevertheless, creating a MultiReader and making a searcher out of it has one > > advantage (at least can be an advantage depending on the scenario): The > > document freqiencies of a term will be summed up, and thus it is 100% > > transparent for scoring whether the indices are splittet or not. > > > > I'm wondering whether there is the possibility to get the advantages of both > > scenarios, e.g. by first summing up the query terms-related document > > frequencies, and sending them together with the query to every > > (remote)searcher of ParallelMultiSearcher, for scoring. > > > > Maybe this is exactly what ParallelMultiSearcher does, and I haven't seen > > it? > > > > > > Thanks for clarification! > > > > Chris > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org >
signature.asc
Description: PGP signature