On Mon, Feb 22, 2010 at 4:18 PM, Tamas Jambor <[email protected]> wrote: > >> What if the weights are 1,1,-1,-1? The estimate is -2 then. This is >> why I say this won't work. >> > > you could trim the result so you would get 1, which is the same as what you > get with your approach.
What about the case where you have just 1-2 similar items, all similarity 0. The result is undefined. That could be patched too, but it's why it just feels like defining it this way is problematic. You can't meaningfully take a weighted average with negative weights. > if you take cosine similarity, 0 mean that vectors are independent, that is > also an implication that > there is no mutual information. although cosine cannot get negative in this > case. The cosine similarity can be negative, but yes 0 here also means no relation. This isn't true of, say, a Euclidean distance-based measure. > evaluate. The first one is your approach, and the second one is the one I > mentioned in the previous email. How are you dealing with negative/undefined predictions though? I am also not sure what defining it this does to the accuracy of estimated preference values, which is what the evaluator would test. This tends to push estimates towards negative values. I could believe it works a little better for your data set -- so you should use your variation, especially if you only care about precision and recall. I don't know that this is going to be better in general -- honestly don't know, I haven't studied this. Failing that, I am just having a hard time implementing this as a ill-defined weighted average, no matter what it seems to do to one data set. is there no third way? maybe someone can think of a standard solution to this.
