Lucene uses a variation of TF-TDF for the similarity score where the
document length is one of the factors. In the future you will be able
to modify scoring to your needs: 
http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg01727.html

There might be more in the FAQ or in the mail archive. This is
something that has been discussed quite often. For example, you could
learn about the scoring mechanism in:
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq#q31

--- Chris Sibert <[EMAIL PROTECTED]> wrote:
> I am disatisfied with the document scores that I'm getting. If a
> document is short, and has one occurrence of the search term, it is
> ranked higher than a longer document with two occurrences of the
> term. This makes little sense to me, and I'd like the longer document
> with more occurrences to be ranked higher. I figured I have to
> override the scoring method, but I can't find where Lucene actually
> does the scoring. This is actually not an uncommon problem for me, as
> I find perusing the API to be high on the confusing scale, due to the
> lack of comprehensive Javadoc documentation. (Something that even Sun
> doesn't spend much time on.) I attempt to read the code, but variable
> names are terse, and there's a dearth of commenting, which makes it
> fairly unfathomable. 


=====
__________________________________
[EMAIL PROTECTED] -- http://www.lissus.com

__________________________________________________
Do You Yahoo!?
Yahoo! Finance - Get real-time stock quotes
http://finance.yahoo.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to