FYI: The Wiki has a fair number of resources on IR: http:// wiki.apache.org/jakarta-lucene/InformationRetrieval (I have added a link to this conversation, which contains a lot of useful information)

Karl, if you are so inclined, please feel free to add any of the references you have found that have been helpful that aren't already on this page (anyone can edit the Wiki with an login)

-Grant

On Dec 14, 2006, at 4:59 AM, Soeren Pekrul wrote:

Soeren Pekrul wrote:
The score for a document is the sum of the term weights w(tf, idf) for each containing term. So you have already the combination of coordination level matching with IDF. Now it is possible that your query requests three terms A, B and C. Two of them (A and B) are quite often in the collection one (C) is very rare. It could be possible that documents are matching just C have a higher score than documents containing A and B. To avoid this you can give the coordination a higher influence by multiplying the sum of term weights with the coordination as additional factor.

Addendum:
For the query Q(A, B, C) with
A: df++ (ifd--)
B: df++ (idf--)
C: df-- (idf++)
the user would probably expect the following ranking:
1. D(A, B, C)
2. D(A, C), D(B, C)
3. D(A, B)
4. D(C)
5. D(A), D(B)

Sören

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to