Hey Christian, you might wanna look at distributed solr (http://wiki.apache.org/solr/DistributedSearch) or if you haven't done so have a look at the Katta project (http://katta.sourceforge.net/documentation/how-katta-works) maybe this can help you out. About distributed IDF and Scoring have a look at this link: http://wunderwood.org/most_casual_observer/2007/04/progressive_reranking.html
Simon On Tue, Aug 4, 2009 at 5:00 PM, Christian Reuschling<christian.reuschl...@gmail.com> wrote: > Hi Otis, > > thanks for the answer - I'm aware of Solr, but it seems this is - according to > its abstraction level - too generalized for us. Solr seems to be nice in the > case you want to use the black box, and won't be aware of 'what is under the > hood'. > But maybe I'm totaly wrong. At least, it would be from interest how Solr > realizes its distributed search, in the case it makes something different > than using the core-Lucene ParallelMultiSearcher with RemoteSearchables. Maybe > on this list somebody knows the answer. > > > > > > On Tue, 4 Aug 2009 07:20:23 -0700 (PDT) > Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote: > >> Hi Christian, >> >> You didn't mention Solr, so I'm not sure if you are aware of it. Maybe Solr >> meets your needs? >> >> Otis >> -- >> Sematext is hiring -- http://sematext.com/about/jobs.html?mls >> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >> >> >> >> ----- Original Message ---- >> > From: Christian Reuschling <christian.reuschl...@gmail.com> >> > To: java-user@lucene.apache.org >> > Sent: Tuesday, August 4, 2009 5:50:16 AM >> > Subject: ParallelMultiSearcher and idf >> > >> > Hello, >> > >> > when searching over multiple indices, we create one IndexReader for each >> > index, and wrap them into a MultiReader, that we use for IndexSearcher >> > creation. >> > >> > This is fine for searching multiple indices on one machine, but in the case >> > the indices are distributed over the (intra)net, this scenario has several >> > lacks: >> > >> > - searching/scoring/sorting is 100% on the client machine, so you need all >> > the ram and cpu power at every client. >> > - all the data necessary for scoring must go over the net - so the traffic >> > should be significantly higher >> > - thus, there is a lack of overall performance >> > >> > Nevertheless, creating a MultiReader and making a searcher out of it has >> > one >> > advantage (at least can be an advantage depending on the scenario): The >> > document freqiencies of a term will be summed up, and thus it is 100% >> > transparent for scoring whether the indices are splittet or not. >> > >> > I'm wondering whether there is the possibility to get the advantages of >> > both >> > scenarios, e.g. by first summing up the query terms-related document >> > frequencies, and sending them together with the query to every >> > (remote)searcher of ParallelMultiSearcher, for scoring. >> > >> > Maybe this is exactly what ParallelMultiSearcher does, and I haven't seen >> > it? >> > >> > >> > Thanks for clarification! >> > >> > Chris >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org