Have a look at the Similarity class and also the Scoring section of the website (Documentation-> Scoring on the left hand side) This is a classic problem of dealing with TF/IDF and length normalization. Lucene makes general assumptions about what is best, but does allow you to tune as well (which can be an abyss one never returns from if one isn't careful).

-Grant

On Jan 11, 2008, at 9:56 AM, thrgroovyboy wrote:


Hi,

When I am searching with lucene, the formula takes care of the number of
total words in the document.

For exemple, an indexed one power-point slide with the term "JAVA" is most
relevent than a 50 pages Word document on JAVA.

It is a problem for me, the Word document on Java should be most relevant
than the only one ppt slide...

Is there something that I can do ?

Thanks a lot,

Fab
--
View this message in context: 
http://www.nabble.com/Question-about-Search-formula-tp14757377p14757377.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
http://www.lucenebootcamp.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to