Re: Lucene scoring: coord_q_d factor

Grant Ingersoll Thu, 14 Dec 2006 04:31:53 -0800

FYI: The Wiki has a fair number of resources on IR: http://wiki.apache.org/jakarta-lucene/InformationRetrieval (I have added alink to this conversation, which contains a lot of useful information)

Karl, if you are so inclined, please feel free to add any of thereferences you have found that have been helpful that aren't alreadyon this page (anyone can edit the Wiki with an login)


-Grant

On Dec 14, 2006, at 4:59 AM, Soeren Pekrul wrote:

Soeren Pekrul wrote:
The score for a document is the sum of the term weights w(tf, idf)for each containing term. So you have already the combination ofcoordination level matching with IDF. Now it is possible that yourquery requests three terms A, B and C. Two of them (A and B) arequite often in the collection one (C) is very rare. It could bepossible that documents are matching just C have a higher scorethan documents containing A and B. To avoid this you can give thecoordination a higher influence by multiplying the sum of termweights with the coordination as additional factor.
Addendum:
For the query Q(A, B, C) with
A: df++ (ifd--)
B: df++ (idf--)
C: df-- (idf++)
the user would probably expect the following ranking:
1. D(A, B, C)
2. D(A, C), D(B, C)
3. D(A, B)
4. D(C)
5. D(A), D(B)

Sören

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene scoring: coord_q_d factor

Reply via email to