Hello Sebastian, thank you for sharing your experience. I am happy that I am not the only person with this problem.
I have read the previous paper by Robertson et al http://citeseer.ist.psu.edu/robertson04simple.html where he wrote about the danger of using combined scores and provided a solution via linear combination of TFs before inserting it into the BM25 weighting algoritm. However, this does not apply to my/your problem since it works only with one sort of scoreing function respectively and not like in my case with two differen sorts of scoring that are generally different. I think the paper you suggested might be closer to my need - although I doubt it is close enough to inspire me or even to provide some mathematical justification for simple operations between two scores (like multiplication). Are you aware of any mathematical justification for multiplying the two scores? Did you have any other motivation behind it besides its simplicity? Thank you advance! Kind Regards, Karl Regrading your solution, do you have a publication or is there a planned publication about what you did for your solution? > --- Ursprüngliche Nachricht --- > Von: Sebastian Marius Kirsch <[EMAIL PROTECTED]> > An: java-user@lucene.apache.org > Betreff: Re: About Combining Scores > Datum: Sun, 13 Nov 2005 10:10:22 +0100 > > On Sun, Nov 13, 2005 at 12:04:41AM +0100, Karl Koch wrote: > > My aim is to combine this two scores. The Lucenes score is normalisied > > between 0.0 and 1.0 (if the score exceeded 1.0 at some point) or less > then > > 1.0 (if it did not). The user model looks the same in this perspective - > > although based on different data - a 1.0 means the maximum of relevance > and > > a 0.0 a minimum or relevance. At the moment I am multiplying the Lucene > > score with the score produced by the user model. This means the > resulting, > > combiend socre is number between 0.0 and 1.0 and represents the merged > view > > from both models - the IR view and the view of the user model. > > I came across that question too recently; it seems to be a rather > under-researched topic in the literature. I used multiplication in the > end, because it's simple, it produces reasonable results, it's not > tunable, and it's invariant to normalization. (Don't make a model with > tunable parameters if you don't know how to tune them ...) > > The most helpful paper I came across was this: > > http://trec.nist.gov/pubs/trec13/papers/microsoft-cambridge.web.hard.pdf > > It's about combining PageRank with a relevance score, but it contains > a good description of how they arrived at their scoring formula. They > use a linear combination of the two measures and transform them to > have a roughly similar distribution. They then tuned the parameters > using a test corpus (which may be difficult/impossible for your > application.) Their system was one of the best at TREC-13. > > Regards, Sebastian > > -- > Sebastian Kirsch <[EMAIL PROTECTED]> > [http://www.sebastian-kirsch.org/] > > NOTE: New email address! Please update your address book. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Highspeed-Freiheit. Bei GMX supergünstig, z.B. GMX DSL_Cityflat, DSL-Flatrate für nur 4,99 Euro/Monat* http://www.gmx.net/de/go/dsl --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]