Hi all,
Lucene score document based on the correlation between
the query q and document t:
(this is raw function, I don't pay attention to the 
boost_t, coord_q_d factor)

score_d = sum_t( tf_q * idf_t / norm_q * tf_d * idf_t
/ norm_d_t)  (*)

Could anybody explain it in detail ? Or are there any
papers, documents about this function ? Because:

I have also read the book: Modern Information
Retrieval, author: Ricardo Baeza-Yates and Berthier 
Ribeiro-Neto, Addison Wesley (Hope you have read it
too). In page 27, they also suggest a scoring funtion
for vector model based on the correlation between
query q and document d as follow (I use different
symbol):

                 sum_t( weight_t_d * weight_t_q) 
score_d(d, q)=  --------------------------------- (**)
                      norm_d * norm_q 

where weight_t_d = tf_d * idf_t
      weight_t_q = tf_q * idf_t
      norm_d = sqrt( sum_t( (tf_d * idf_t)^2 ) )
      norm_q = sqrt( sum_t( (tf_q * idf_t)^2 ) )

(**):          sum_t( tf_q*idf_t * tf_d*idf_t) 
score_d(d, q)=---------------------------------  (***)
                   norm_d * norm_q 

The two function, (*) and (***), have 2 differences:
1. in (***), the sum_t is just for the numerator but
in the (*), the sum_t is for everything. So, with
norm_q = sqrt(sum_t((tf_q*idf_t)^2)); sum_t is
calculated twice. Is this right? please explain.

2. No factor that define norms of the document: norm_d
in the function (*). Can you explain this. what is the
role of factor norm_d_t ?

One more question: could anybody give me documents,
papers that explain this function in detail. so when I
apply Lucene for my system, I can adapt the document,
and the field so that I still receive the correct
scoring information from Lucene .

Best regard,
Thanks every body,

=====
Ð#7863;ng Nhân 





                
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to