Chuck Williams wrote:
I believe the biggest problem with Lucene's approach relative to the pure vector space model is that Lucene does not properly normalize. The pure vector space model implements a cosine in the strictly positive sector of the coordinate space. This is guaranteed intrinsically to be between 0 and 1, and produces scores that can be compared across distinct queries (i.e., "0.8" means something about the result quality independent of the query).

I question whether such scores are more meaningful. Yes, such scores would be guaranteed to be between zero and one, but would 0.8 really be meaningful? I don't think so. Do you have pointers to research which demonstrates this? E.g., when such a scoring method is used, that thresholding by score is useful across queries?


Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to