On Monday 06 June 2005 22:59, Andy Liu wrote: > Is there a way to calculate term frequency scores that are relative to > the number of terms in the field of the document? We want to override > tf() in this way to curb keyword spamming in web pages. In > Similarity, only the document's term frequency is passed into the tf() > method: > > float tf(int freq) > > It would be nice to have something like: > > float tf(int freq, String fieldName, int numTerms) > > If this isn't available out of the box, how difficult would it be to > hack up Lucene to allow for this?
Have a look here: http://issues.apache.org/bugzilla/show_bug.cgi?id=31784 It scores terms by density and it uses a separate table mapping the norms stored in the index to inverse doc lengths. This table could be adapted as needed. When that is not enough, it's probably a good start for what you need. Regards, Paul Elschot. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
