Christoph Goller wrote:

  > So the changes for the MultiSearcher bug would remain locally in
  > MultiSearcher.
  > I think this would be a very clean solution. What do others think?

Sounds good to me.  Wolf is writing a patch to fix this bug, so it could
depend on how far he's gotten already and/or other issues he's
discovered by actually writing the code.

  > This means you would remove the idf contribution from the document
  > vector
  > and leave the one from the query vector?

That seems right, as the way the code is currently factored it's the
query vector factor that determines the normalization and the idf should
remain in the normalization.

Chuck

  > -----Original Message-----
  > From: Christoph Goller [mailto:[EMAIL PROTECTED]
  > Sent: Friday, January 28, 2005 1:29 AM
  > To: Lucene Developers List
  > Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems
with
  > Similarity.docFreq() ?
  > 
  > Chuck Williams schrieb:
  > > Actually, the normalize is a third idf factor (in a different
form,
  > > square-rooted in the denominator and summed).
  > >
  > > I.e., for a simple BoolanQuery:
  > >
  > > score(query, doc) =
  > >   coord*queryNorm*
  > >     sum[ term in query :
  > >          idf(term)*boost(term)*idf(term)*tf(term,
doc)*docNorm(doc)
  > >        ]
  > >
  > > where queryNorm = 1/sum[ term in query : (boost(term)*idf(term))^2
]
  > >
  > > So, only the Scorer terms tf(term, doc) and docNorm(doc) depend on
the
  > > doc.  The result of the computation only depends on the boosts and
  > > idf's, and so can be computed by MultiSearcher augmented with a
global
  > > idf table.
  > >
  > > I.e., to be explicit, the queryNorm could also be factored into
the
  > > boost if that implementation is desired.  The MultiSearcher boost
  > could
  > > be all terms in the formula above except for
tf(term,doc)*docNorm(doc).
  > 
  > Great. You are right Chuck.
  > The similarity specified for the search has to be modified so that
both
  > idf(...) AND  queryNorm(...) always return 1 and as you say
everything
  > except for tf(term,doc)*docNorm(doc) could be precompiled into the
  > boosts
  > of the rewritten query. coord/tf/sloppyFreq computation would be
done
  > locally by the Searchables as specified for this search.
  > 
  > So the changes for the MultiSearcher bug would remain locally in
  > MultiSearcher.
  > I think this would be a very clean solution. What do others think?
  > 
  > > However, there may be one problem with this approach.  It loses
  > > information that might be necessary for a proposal of mine, which
is
  > to
  > > fix Lucene's normalization (again discussed ad nauseum on an
earlier
  > > thread).  I'm not sure whether that algorithm could be done in
concert
  > > with the boost-based MultiSearcher rewriting approach (and am also
not
  > > sure it couldn't).
  > >
  > > Re. idf^2, it's the squaring in the numerator that I think is
bogus:
  > 
  > This means you would remove the idf contribution from the document
  > vector
  > and leave the one from the query vector?
  > 
  > regards,
  > Christoph
  > 
  >
---------------------------------------------------------------------
  > To unsubscribe, e-mail: [EMAIL PROTECTED]
  > For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to