On Thursday 13 January 2005 01:19, Chuck Williams wrote:
> I think there is another problem here.  It is currently the Weight
> implementations that do rewrite(), which requires access to the index,
> not just to the idf's.  E.g., RangeQuery.rewrite() must find the terms
> in the index within the range.  So, the Weight cannot be computed in the
> MultiSearcher, as it does not have direct access to the remote index.
>
> This seems to put the viability of the whole approach into question.
> The better approach may be to distribute an aggregate docFreq table to
> each remote node.  A simple interim step could be to support a callback
> to the dispatcher node from docFreq on the remote node, although this
> would be gross (remote node calls dispatcher node to get docFreq which
> in turn calls all remote nodes to get all their docFreqs and sum them).
> 
> We need an aggregate docFreq table, and it needs to be on the remote
> nodes since the Weight's cannot be computed until after the Query is
> rewritten, which requires access to the index on the remote node.

An alternative is to rewrite to a central cache, which is possible because
because it contains all terms and their total document frequencies.
After that all terms and their weights can be sent to the remote searchers,
which can then drop the terms that they don't have.

If it is possible to send a truncated term (or a range) with a centrally
determined weight to the remote searcher, this would avoid sending all terms
to all remote searchers.
In that case the remote searchers might rewrite again to
select only the terms they have indexed themselves.

The question then is whether it is possible to send the query extended with
weights to the remote searchers. Sounds doable to me.

It's losing simplicity, though. OTOH, with a replicated cache, much the same 
thing would need to be done remotely.

Regards,
Paul Elschot.

P.S. Are you sure it is worthwhile to do this?
Term density (and it's square root tf()) vary much more than idf nowadays.

> Chuck
> 
>   > -----Original Message-----
>   > From: Wolf Siberski [mailto:[EMAIL PROTECTED]
>   > Sent: Wednesday, January 12, 2005 4:08 PM
>   > To: Lucene Developers List
>   > Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems
> with
>   > Similarity.docFreq() ?
>   > 
>   > Doug Cutting wrote:
>   > > Wolf Siberski wrote:
>   > >
>   > >> Chuck Williams wrote:
>   > >>
>   > >>> This is a nice solution!  By having MultiSearcher create the
> Weight,
>   > it
>   > >>> can pass itself in as the searcher, thereby allowing the correct
>   > >>> docFreq() method to be called.  This is similar to what I tried
> to
>   > do
>   > >>> with topmostSearcher, but a much better way to do it.
>   > >>
>   > >> This still wouldn't work for RemoteSearchables, except if you
> allow
>   > >> call-backs from each RemoteSearchable to the MultiSearcher.
>   > >
>   > > I don't see what callbacks are required.  When the Weight is
>   > constructed
>   > > it invokes docFreq for each term, which, if RemoteSearchables are
>   > > involved, will result in IPC calls to those RemoteSearchables.
> Then,
>   > > the Weight object is serialized to each RemoteSearchable and a
> TopDocs
>   > > is returned.  Where are the callbacks?  These are only required
> for
>   > > HitCollector-based methods, which are not advised with
>   > RemoteSearchable.
>   > 
>   > Yes, I agree. I just wanted to point out that the current Weight
>   > implementations need to be modified heavily to introduce the
>   > behaviour you describe above. For example, take a look at
>   > TermQuery.TermWeight.scorer():
>   >     [...]
>   >     return new TermScorer(this, termDocs, getSimilarity(searcher),
>   >                           reader.norms(term.field()));
>   > 
>   > This typically results in a call to searcher.getSimilarity().
>   > In the new context, the searcher would be a MultiSearcher,
>   > and to resolve that call at on of the RemoteSearchables, the
>   > method getSimilarity() would have to be called remotely on it.
>   > In this case, we can change it so that the Weight is provided
>   > with the Similarity object before it is serialized and sent
>   > to the RemoteSearchables. But I'm not sure if all these cases
>   > can be resolved that easily. As you already have pointed out,
>   > it won't be possible for HitCollector-related Weights.
>   > 
>   > But, as I said, I still agree fully with the approach.
>   > 
>   > 
>   > 
>   >
> ---------------------------------------------------------------------
>   > To unsubscribe, e-mail: [EMAIL PROTECTED]
>   > For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to