Christoph Goller wrote: > Chuck Williams wrote: >> score(query, doc) = >> coord*queryNorm* >> sum[ term in query : >> idf(term)*boost(term)*idf(term)*tf(term, doc)*docNorm(doc) >> ] >> >> where queryNorm = 1/sum[ term in query : (boost(term)*idf(term))^2 ] >> [...] The MultiSearcher boost could >> be all terms in the formula above except for tf(term,doc)*docNorm(doc). > > > Great. You are right Chuck. > The similarity specified for the search has to be modified so that both > idf(...) AND queryNorm(...) always return 1 and as you say everything > except for tf(term,doc)*docNorm(doc) could be precompiled into the boosts > of the rewritten query. coord/tf/sloppyFreq computation would be done > locally by the Searchables as specified for this search. > > So the changes for the MultiSearcher bug would remain locally in MultiSearcher. > I think this would be a very clean solution. What do others think?
I have just added a new version of the patch to Bugzilla which goes in this direction. Everything except the boost readjustment is implemented now (there is a temporary replacement to make the patch work anyway).
There are two reasons why I didn't yet implement the boost factor adjustment as proposed by Christoph: 1. I'm still in the process of acquiring a detailed understanding of how and where all the weighting is happening. 2. (This is the more important point.) While first I considered it a good idea to use the boost as correction factor, now I'm not so sure anymore. When I started the implementation I recognized soon that I was essentially repeating the weight/scorer preparation process outside of the query. In other words, I was duplicating program logic. Thus, if the Lucene query evaluation process changes in the future, the MultiSearcher will always have to be maintained, and this smells bad (as the XP-ers would say).
Now, here is my suggestion what to do instead: if we can precalculate this factor before evaluating each single document, why don't we do that in all cases? I'm imagining something like a second rewrite step which prepares the weights as outlined by Chuck and Christoph, and is done before every scoring. In the non-distributed case this step would just be executed before creating the scorers, in the distributed case it would be executed by the MultiSearcher, and then the prepared query would be distributed to all searchables.
Would this be a reasonable approach? If yes, is someone more familiar with query/weight internals willing to implement it? I could try to do it, but it seems that this task really needs to touch the Lucene 'kernel', and I feel rather as a newbie in this area.
If someone wants to take a look at the patch, the best start would be MultiSearcher.prepareQueries(). I'd appreciate any comments regarding the patch. For example, I'm not too happy with the introduction of the Query.addTerms() method, but don't know how else to get the required information.
--Wolf
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]