Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

Wolf Siberski Wed, 12 Jan 2005 10:22:15 -0800

Chuck Williams wrote:

I've read through Wolf's patch and see a few issues (please correct
anything wrong here):
  1.  DfMapSimilarity works only with a limited set of queries.[...]
  2.  The patch hardwires the use of DfMapSimilarity into MultiSearcher.[...]
  3.  Philosophically, I'm not convinced that Similarity's are the right
      solution.[...]

I agree with all three points.

Regarding the point 1, IMHO it will be very difficult to find an
efficient algorithm for all types of queries, because MultiSearcher
doesn't know in advance for which terms the idfs have to be provided,
and we don't want a bidirectional call relationship between MultiSearcher
and RemoteSearchables (or do we?). What we can do is make the framework
flexible enough that each user can trade efficiency vs. query complexity
by configuring the MultiSearcher according to his needs.

Point 2 can be solved, I just haven't found the right solution.

Point 3 is completely right, too. I was looking for a way to make this
work without too much redesign, but Similarity just isn't the right location.

Doug Cutting wrote:

The root of the bug is in MultiSearcher.search().  This should construct
a Weight, weight the query, then score the now-weighted query.


Indeed, Weight is the appropriate abstraction which needs to be modified.

Chuck Williams wrote:

This is a nice solution!  By having MultiSearcher create the Weight, it
can pass itself in as the searcher, thereby allowing the correct
docFreq() method to be called.  This is similar to what I tried to do
with topmostSearcher, but a much better way to do it.


This still wouldn't work for RemoteSearchables, except if you allow
call-backs from each RemoteSearchable to the MultiSearcher. For
this, MultiSearcher would have to be remotely callable, too. As I said
above, IMHO we should stay with a simple client/server model here.
From the MultiSearchers perspective, we just want to query several
information sources instead of one. If this would imply that we have to
expose ourselves as server, it would impose too great demands (IMHO).
Of course, for some applications this might be the way to go, but
I think we shouldn't make it mandatory.

However, to avoid callbacks the weight implementations
will need to change significantly, because currently they delegate
(via Query->Similarity) to the Searcher. Instead the MultiSearcher
would have to provide them with sufficient information which is then
used directly by the weight (in the same manner as DfMapSimilarity
works in my patch).

I'll take a deeper look at the different Weight implementations
in the next few days to see how this could be done.

--Wolf

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?

Reply via email to