Karl Koch wrote: > The coord(q,d) normalisation is "a score factor based on how many of > the query terms are found in the specified document." and described > here: > > http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord > > Does this have a theoretical base? On what basis was the decition > make to have it? Does anybody know a paper (in Information Retrieval, > Information Seeking, etc.) or other more general information about > this?
Following is quoted from: Krovetz, R. & Croft, W. B. (1992) Lexical Ambiguity and Information Retrieval. ACM Transactions on Information Systems, 10(2): 115-141. Many retrieval systems represent documents and queries by the words they contain, and base the comparison on the number of words they have in common. The more words the query and document have in common, the higher the document is ranked; this is referred to as a "coordination match." Performance is improved by weighting query and document words using frequency information from the collection and individual document texts [27]. 27. Salton, G. & McGill, M. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]