Re: Changing ranking

Marvin Humphrey Thu, 23 Mar 2006 11:45:56 -0800


On Mar 23, 2006, at 11:22 AM, Otis Gospodnetic wrote:

The place to start would be to look at the DefaultSimilarity, andthe norms method there. Perhaps you want to create your ownSimilarity implementation that returns either a constant 1 orsomething else that will favour longer text. Somebody else withmore experience in this area may have better or more precisesuggestions.

Here's an implementation of lengthNorm() that stops stops theweighting at 100 tokens.


  public float lengthNorm(String fieldName, int numTerms) {
    numTerms = numTerms < 100 ? 100 : numTerms;
    return (float)(1.0 / Math.sqrt(numTerms));
  }

If you adopt it, you must boost short but important fields (e.g.title), or they won't contribute enough.

KinoSearch (my loose Perl/C port of Lucene) uses this algorithm, andit seems to work well.

To see an earlier discussion on this subject perform a web search for"proposal defaultsimilarity lengthnorm".


Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Changing ranking

Reply via email to