Re: Lucene scoring: coord_q_d factor

Steven Rowe Tue, 12 Dec 2006 07:01:30 -0800

Karl Koch wrote:
> The coord(q,d) normalisation is "a score factor based on how many of
> the query terms are found in the specified document." and described
> here:
> 
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord
> 
> Does this have a theoretical base? On what basis was the decition
> make to have it? Does anybody know a paper (in Information Retrieval,
> Information Seeking, etc.) or other more general information about
> this?


Following is quoted from: Krovetz, R. & Croft, W. B. (1992) Lexical
Ambiguity and Information Retrieval. ACM Transactions on Information
Systems, 10(2): 115-141.

    Many retrieval systems represent documents and queries
    by the words they contain, and base the comparison on
    the number of words they have in common. The more
    words the query and document have in common, the
    higher the document is ranked; this is referred to as
    a "coordination match."  Performance is improved by
    weighting query and document words using frequency
    information from the collection and individual
    document texts [27].

27. Salton, G. & McGill, M. Introduction to Modern Information
Retrieval. McGraw-Hill, New York, 1983.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene scoring: coord_q_d factor

Reply via email to