I would like to know if there is a simple way to force Lucene to adopt the simple cosine similarity of the term frequency vectors of the documents and the query for ranking the result. In practice the score sc_i of the document i should be given by:
sc_i = (D_i*Q)/(|D_i|*|Q|) where D_i = vector of the term frequencies of document i; Q = vector of the term frequencies of the Query; * = scalar product; || = norm of the vector (the square root of the sum of the squares of the entries of the vector). I wasn't able to find a way to evaluate |D_i|. Thank you Claudio ====================================== Ing. Claudio Gennaro, PhD ISTI (Information Science and Technology Institute) Consiglio Nazionale delle Ricerche Area della Ricerca di Pisa (room I14) Via G. Moruzzi 1 56124 Pisa - ITALY phone: +39 050 315 3077 mobile: +39 328 92 16 734 fax: +39 050 315 3464 or +39 050 315 2810 e-mail: claudio.genn...@isti.cnr.it --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org