On Nov 20, 2009, at 1:24 PM, Jake Mannix wrote: > > On Fri, Nov 20, 2009 at 10:08 AM, Grant Ingersoll <gsing...@apache.org> wrote: >> I should add in my $0.02 on whether to just get rid of queryNorm() >> altogether: >> >> -1 from me, even though it's confusing, because having that call there >> (somewhere, at least) allows you to actually do compare scores across >> queries if you do the extra work of properly normalizing the documents as >> well (at index time). > > Do you have some references on this? I'm interested in reading more on the > subject. I've never quite been sold on how it is meaningful to compare > scores and would like to read more opinions. > > References on how people do this *with Lucene*, or just how this is done in > general?
in general. Academic references, etc. > There are lots of papers on fancy things which can be done, but I'm not sure > where to point you to start out. The technique I'm referring to is really > just the simplest possible thing beyond setting your weights "by hand": let's > assume you have a boolean OR query, Q, built up out of sub-queries q_i > (hitting, for starters, different fields, although you can overlap as well > with some more work), each with a set of weights (boosts) b_i, then if you > have a training corpus (good matches, bad matches, or ranked lists of matches > in order of relevance for the queries at hand), *and* scores (at the q_i > level) which are comparable, then you can do a simple regression (linear or > logistic, depending on whether you map your final scores to a logit or not) > on the w_i to fit for the best boosts to use. What is critical here is that > scores from different queries are comparable. If they're not, then queries > where the best document for a query scores 2.0 overly affect the training in > comparison to the queries where the best possible score is 0.5 (actually, > wait, it's the reverse: you're training to increase scores of matching > documents, so the system tries to make that 0.5 scoring document score much > higher by raising boosts higher and higher, while the good matches already > scoring 2.0 don't need any more boosting, if that makes sense). > This makes sense from a mathematical sense, assuming scores are comparable. What I would like to get at is why anyone thinks scores are comparable across queries to begin with. I agree it is beneficial in some cases (as you described) if they are. Probably a question suited for an academic IR list... > There are of course far more complex "state of the art" training techniques, > but probably someone like Ted would be able to give a better list of > references on where is easiest to read those from. But I can try to dredge > up some places where I've read about doing this, and post again later if I > can find any. >