Resending again, since my question didn't get much attention ---------- Forwarded message ---------- From: Kasun Perera <kas...@opensource.lk> Date: Tue, Jun 19, 2012 at 3:26 PM Subject: Different Weights to Lucene fields with Okapi Similarity To: java-user@lucene.apache.org
Based on this link http://www2002.org/CDROM/refereed/643/node6.html , I'm calculating Okapi similarity between the query document and another document as below using Lucene: I have indexed the documents using 3 fields. I want to give higher weight to field 2 and field 3. I can't use Lucene's boost function since i'm using a my own similarity function. Can anyone suggest me a method how to give different weights to fields using this Okapi Similarity function? This is Okapi Similarity Schema that I have used sim(query, doc) = sum(t in terms(query), freq(t, query) * w(t, doc)) where (from the second link, slightly modified as I think the formula in the link is incorrect) w(t, doc) = idf(t) * (k+1)*freq(t, doc) / (k*(1-b + b*ls(doc)) + freq(t, doc)) ls(doc) = len(doc)/avgdoclen and idf(t) is idf(t) = log (totalNumIndexedDocs - docFreq + 0.5)/(docFreq + 0.5), freq(t, doc) is the frequency of term t in document doc. Choosing b=0.25 and k = 1.2 you get w(t, doc) = idf(t) * 2.2*freq(t, doc) / (1.2*(0.25+0.75*ls(doc)) + freq(t, doc)) -- Regards Kasun Perera