2010/5/5 José Ramón Pérez Agüera <jose.agu...@gmail.com> > Hi Robert, > > the problem is not the linear combination of fields, the problem is to > apply the boost factor per field after the term frequency saturation > function and then make the linear combination of fields. Every system > that implement BM25F, including terrier, take care of that, because if > you don't do it you have a bug in your ranking function and not just a > different ranking function. >
José, well then this should not be much of a problem to handle in LUCENE-2392, because as I mentioned, if you have a tf() or idf() its really because you decided to do this yourself. So you could easily apply the boost inside your log or sqrt or whatever, if you want. But what I propose we do, is make sure the relevance functions we provide (especially any default for 4.0) take care of this for your structured case, while still providing the capability for someone to get the old behavior [see below] > If you implement this little > change, Lucene ranking fucntion will work properly with structured > documents and all your other concerns about allowing users to > implement different ranking functions for different situations will be > not affected by this change. > > Well, I'm not sure all my concerns go away! I think its best to implement a change like this in the flexible scoring framework (LUCENE-2392), so that users, if they want, can get the old behavior: "the bug" as you call it. The reason I say this due to the unique cases of lucene, some people are doing scoring in very crazy ways and if they aren't able to get the old behavior with regards to boosting, they might be upset... even if it is really giving them worse relevance... -- Robert Muir rcm...@gmail.com