Weights can't be negative and still be weights. You can have large
(positive) weights on negative training examples (aka "not like this"), but
you can't really have a negative weight.
How "not like this" is encoded depends a lot on the algorithm you are
using. In a (roughly) least squares world such as used by correlational
recommendation systems, you could invert the loss function so that you are
going for maximum squared error for the negative examples. It is likely
that you will have to avoid having the negative examples chase the solution
arround more effectively than the positive examples attract it by using a
looser loss function on negative examples. This also would take into
account the fact that strong negative ratings are actually more like the
right answer than the average case at large.
Averaging is just such a case since the mean is just the least squares
solution for positive weights. When you have negative weights that
represent examples that you want to avoid, you can simply include those
weights in the weighted average and get the "correct" solution. Since the
loss is unbounded negative, you get nonsensical solutions such as in your
examples. For instance, if you have single negatively weighted example, the
loss is minimized at infinity because the further you are from that example,
the better. In your recommendations case, this should be handled by putting
a constraint on the results (i.e. bounding to [1..5]). You also have to
check your result to determine whether it is the maximum loss or minimum
loss.
Example 1:
-1 similarity to a user with a single 5 rating on a movie and no other
similarities to other users. Weighted average rating on this movie is (-1 *
5) / -1 = 5. But this is the loss MAXimum ... the worst possible answer.
To get the right answer, we have to check both end-points. Within the range
[1..5], the loss is lowest at 1.
Example 2:
-1 similarity to a user with a single 4 rating on a movie and nothing
else. Weighted average is again the maximum and occurs at 4. Both
endpoints are better than this, but 1 is further from the negative example
and is this the best answer.
Example 3:
-1 similarity to one user with a single 4 rating and +1 similarity to
another user with a 4 rating, both on the same movie. In this case, the
weighted average is undefined (0/0). This occurs because the loss function
is totally flat and has no optimum.
Example 4:
-1 similarity to a rating of 2, +1 similarity to a rating of 4 and +1
similarity to a rating of 5. The weighted average is (-2 + 4 + 5) /
(-1+1+1) = 7 /1 = 7. The loss function is minimized at 7, but that is
outside our constraints. Of the two end-points, 5 is the better answer.
On Tue, Feb 23, 2010 at 3:49 AM, Sean Owen <[email protected]> wrote:
>
> Ted do you have any standard advice about how people do weighted
> averages when weights are negative?
--
Ted Dunning, CTO
DeepDyve