On Tue, Feb 23, 2010 at 4:16 PM, Tamas Jambor <[email protected]>wrote:
> ok. I understand now. but you would you express this loss function > mathematically? > Did you mean to ask "HOW would I express this"? For many applications where averages seem usable with positive weights, I would use squared distance from positive examples and negative squared distance from negative examples. > > also there is one example when it wouldn't work: > > -1 similarity to one user with a single 1 rating and +1 similarity to > another user with a 5 rating. In this case, the > weighted average is undefined. but in practice this would be an easy 3. > 3 is incorrect, actually. The fact that the sum of the weights does, indeed, indicate that there is no single optimum. Since the numerator in the weighted average is non-zero, we know that the slope of the loss function is constant and negative. This means that the best choice within our constraints is to pick 5. This is better than 3 because it is both farther from the negatively weighted 1 rating and closer to positively rated 5 rating. Because of the peculiarity of squared error, 6 would be even better because moving a step further from the 1 outweighs the movement away from the 5. If we used mean absolute error, we would lose the applicability of weighted averages, but sometimes get more sensible answers because we wouldn't over-weight the outliers. This is known as "using an L_1 loss function" (as opposed to an L_2 loss). The median is an example of a statistic motivated by the L_1 loss. For L_1 loss in your example, any result less than 1 has the same loss, loss decreases rapidly between 1 and 5 and then is constant to the right of 5. Within our constraints, the optimum is 5.
