Re: Relative term frequency?

Paul Elschot Tue, 07 Jun 2005 00:01:57 -0700

On Monday 06 June 2005 22:59, Andy Liu wrote:
> Is there a way to calculate term frequency scores that are relative to
> the number of terms in the field of the document?  We want to override
> tf() in this way to curb keyword spamming in web pages.  In
> Similarity, only the document's term frequency is passed into the tf()
> method:
> 
> float tf(int freq)
> 
> It would be nice to have something like:
> 
> float tf(int freq, String fieldName, int numTerms)
> 
> If this isn't available out of the box, how difficult would it be to
> hack up Lucene to allow for this?


Have a look here:
http://issues.apache.org/bugzilla/show_bug.cgi?id=31784

It scores terms by density and it uses a separate table mapping
the norms stored in the index to inverse doc lengths. 
This table could be adapted as needed.
When that is not enough, it's probably a good start for what
you need.

Regards,
Paul Elschot.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Relative term frequency?

Reply via email to