I've done some customization of scoring/ranking and plan to do more. A good place to start is with your own Similarity, extending Lucene's DefaultSimilarity. Like you, I found the default length normalization to not work well with my dataset. I separately weight each indexed field according to a static relative importance (implemented as a query boost factor that is automatically applied) and then disable length normalization altogether by redefining lengthNorm() to always return 1.0f.
I also had problems with tf and idf normalization, especially with idf dominating the ranking determination. To address that, my Similarity increases the base of the log for each, and adds a final square root to the idf computation since Lucene squares the idf in the score computations. Have you tried the explain() mechanism? It is a great way to see precisely how your results are being scored (but be warned there is a final normalization in Hits that explain() does not show -- this final normalization does not affect the ranking order, but it does affect the final scores). Chuck > -----Original Message----- > From: Sanyi [mailto:[EMAIL PROTECTED] > Sent: Saturday, November 13, 2004 12:38 AM > To: [EMAIL PROTECTED] > Subject: Anyone implemented custom hit ranking? > > Hi! > > I have problems with short text ranking. I've read about same raking > problems in the list > archives, but found only hints and toughts (adjust DefaultSimilarity, > Similarity, etc...), not > complete solutions with source code. > Anyone implemented a good solution for this problem? (example: my search > application returns about > 10-20 pages of 1-2 word hits for "hello", and then it starts to list the > longer texts) > I've implemented a very simple solution: I boost documents shorter than > 300 chars with > 1/300*doclength at index time. Now it works a lot better. In fact, I > can't see any problems now. > Anyway, I think this is not "the solution", this is a patch or > workaround. > So, I'd be interested in some kind of well designed complete solution > for this problem. > > Regards, > Sanyi > > > > __________________________________ > Do you Yahoo!? > Check out the new Yahoo! Front Page. > www.yahoo.com > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]