Chuck Williams schrieb:
Actually, the normalize is a third idf factor (in a different form,
square-rooted in the denominator and summed).

I.e., for a simple BoolanQuery:

score(query, doc) =
sum[ term in query : idf(term)*boost(term)*idf(term)*tf(term, doc)*docNorm(doc)

where queryNorm = 1/sum[ term in query : (boost(term)*idf(term))^2 ]

So, only the Scorer terms tf(term, doc) and docNorm(doc) depend on the
doc.  The result of the computation only depends on the boosts and
idf's, and so can be computed by MultiSearcher augmented with a global
idf table.

I.e., to be explicit, the queryNorm could also be factored into the
boost if that implementation is desired.  The MultiSearcher boost could
be all terms in the formula above except for tf(term,doc)*docNorm(doc).

Great. You are right Chuck. The similarity specified for the search has to be modified so that both idf(...) AND queryNorm(...) always return 1 and as you say everything except for tf(term,doc)*docNorm(doc) could be precompiled into the boosts of the rewritten query. coord/tf/sloppyFreq computation would be done locally by the Searchables as specified for this search.

So the changes for the MultiSearcher bug would remain locally in MultiSearcher.
I think this would be a very clean solution. What do others think?

However, there may be one problem with this approach.  It loses
information that might be necessary for a proposal of mine, which is to
fix Lucene's normalization (again discussed ad nauseum on an earlier
thread).  I'm not sure whether that algorithm could be done in concert
with the boost-based MultiSearcher rewriting approach (and am also not
sure it couldn't).

Re. idf^2, it's the squaring in the numerator that I think is bogus:

This means you would remove the idf contribution from the document vector and leave the one from the query vector?


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to