Re: ParallelMultiSearcher and idf

Simon Willnauer Wed, 05 Aug 2009 00:59:21 -0700

Hey Christian,

you might wanna look at distributed solr
(http://wiki.apache.org/solr/DistributedSearch) or if you haven't done
so have a look at the Katta project
(http://katta.sourceforge.net/documentation/how-katta-works) maybe
this can help you out.
About distributed IDF and Scoring have a look at this link:
http://wunderwood.org/most_casual_observer/2007/04/progressive_reranking.html


Simon

On Tue, Aug 4, 2009 at 5:00 PM, Christian
Reuschling<[email protected]> wrote:
> Hi Otis,
>
> thanks for the answer - I'm aware of Solr, but it seems this is - according to
> its abstraction level - too generalized for us. Solr seems to be nice in the
> case you want to use the black box, and won't be aware of 'what is under the
> hood'.
> But maybe I'm totaly wrong. At least, it would be from interest how Solr
> realizes its distributed search, in the case it makes something different
> than using the core-Lucene ParallelMultiSearcher with RemoteSearchables. Maybe
> on this list somebody knows the answer.
>
>
>
>
>
> On Tue, 4 Aug 2009 07:20:23 -0700 (PDT)
> Otis Gospodnetic <[email protected]> wrote:
>
>> Hi Christian,
>>
>> You didn't mention Solr, so I'm not sure if you are aware of it.  Maybe Solr
>> meets your needs?
>>
>>  Otis
>> --
>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>>
>>
>>
>> ----- Original Message ----
>> > From: Christian Reuschling <[email protected]>
>> > To: [email protected]
>> > Sent: Tuesday, August 4, 2009 5:50:16 AM
>> > Subject: ParallelMultiSearcher and idf
>> >
>> > Hello,
>> >
>> > when searching over multiple indices, we create one IndexReader for each
>> > index, and wrap them into a MultiReader, that we use for IndexSearcher
>> > creation.
>> >
>> > This is fine for searching multiple indices on one machine, but in the case
>> > the indices are distributed over the (intra)net, this scenario has several
>> > lacks:
>> >
>> > - searching/scoring/sorting is 100% on the client machine, so you need all
>> > the ram and cpu power at every client.
>> > - all the data necessary for scoring must go over the net - so the traffic
>> >   should be significantly higher
>> > - thus, there is a lack of overall performance
>> >
>> > Nevertheless, creating a MultiReader and making a searcher out of it has 
>> > one
>> > advantage (at least can be an advantage depending on the scenario): The
>> > document freqiencies of a term will be summed up, and thus it is 100%
>> > transparent for scoring whether the indices are splittet or not.
>> >
>> > I'm wondering whether there is the possibility to get the advantages of 
>> > both
>> > scenarios, e.g. by first summing up the query terms-related document
>> > frequencies, and sending them together with the query to every
>> > (remote)searcher of ParallelMultiSearcher, for scoring.
>> >
>> > Maybe this is exactly what ParallelMultiSearcher does, and I haven't seen
>> > it?
>> >
>> >
>> > Thanks for clarification!
>> >
>> > Chris
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: ParallelMultiSearcher and idf

Reply via email to