FYI: The Wiki has a fair number of resources on IR: http://
wiki.apache.org/jakarta-lucene/InformationRetrieval (I have added a
link to this conversation, which contains a lot of useful information)
Karl, if you are so inclined, please feel free to add any of the
references you have found that have been helpful that aren't already
on this page (anyone can edit the Wiki with an login)
-Grant
On Dec 14, 2006, at 4:59 AM, Soeren Pekrul wrote:
Soeren Pekrul wrote:
The score for a document is the sum of the term weights w(tf, idf)
for each containing term. So you have already the combination of
coordination level matching with IDF. Now it is possible that your
query requests three terms A, B and C. Two of them (A and B) are
quite often in the collection one (C) is very rare. It could be
possible that documents are matching just C have a higher score
than documents containing A and B. To avoid this you can give the
coordination a higher influence by multiplying the sum of term
weights with the coordination as additional factor.
Addendum:
For the query Q(A, B, C) with
A: df++ (ifd--)
B: df++ (idf--)
C: df-- (idf++)
the user would probably expect the following ranking:
1. D(A, B, C)
2. D(A, C), D(B, C)
3. D(A, B)
4. D(C)
5. D(A), D(B)
Sören
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]