On 2/18/06, Paul Elschot <[EMAIL PROTECTED]> wrote: > On Saturday 18 February 2006 02:22, Shailesh Kochhar wrote: > > Hi, > > > > I'm interested in implementing a few new scoring algorithms in Lucene > > and I was wondering if anyone had attempted this in the past and how > > successful they had been. If there are any resources that someone > > could point me to that would be great, Googling and searching the > > mailing-list archives didn't turn up anything. > > > > After looking over the current implementation of tf-idf scoring, I > > concluded that the weighting and scoring framework is mostly > > implemented in TermQuery and TermScorer classes. I am thinking of > > extending these classes and replacing a few others to implement the > > new algorithm. Am I heading in the right direction? Does it make sense > > to try and extend these classes or should I try building a parallel > > heirarchy to do this? > > At the moment I only have time to answer with links: > > http://issues.apache.org/jira/browse/LUCENE-293 > http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200410.mbox/<200410172050.24372.paul.elschot%40xs4all.nl> > http://www.loc.gov/standards/sru/cql/ > http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/surround/
I have a question about the sumOfSquaredWeigths method. As I understand it, it computes the square of the idf for a given term that is used to normalize the weight of individual terms in the query. In implementing a different scoring algorithm, the query normalization I use is different and the sumOfSquaredWeights method isn't needed. However, it is being called from a number of different places that makes it hard to remove. I could easily implement the calculation of the qery normalization factor here, but the name of the method would be very misleading. Is there something I'm missing about this method, or is it a good candidate for renaming to something broader? I feel that the entire scoring framework has many components too tightly knit together that make swapping a new algorithm in quite difficult. Ideally one should only have to extend the Similarity, Query and Scorer classes. Thoughts and comments? - Shailesh
