subject:"Question about evaluating a Recommender System"

Re: Question about evaluating a Recommender System

2013-05-08 Thread Sean Owen

It is true that a process based on user-user similarity only won't be
able to recommend item 4 in this example. This is a drawback of the
algorithm and not something that can be worked around. You could try
not to choose this item in the test set, but then that does not quite
reflect reality in the test.

If you just mean that compressing the range of pref values improves
RMSE in absolute terms, yes it does of course. But not in relative
terms. There is nothing inherently better or worse about a small range
in this example.

RMSE is a fine eval metric, but you can also considered mean average precision.

Sean

On Wed, May 8, 2013 at 4:29 AM, Zhongduo Lin zhong...@gmail.com wrote:
 Hi All,

 I am using the Mahout to build a user-based recommender system (RS). The
 evaluation method I am using is
 AverageAbsoluteDifferenceRecommenderEvaluator, which according to the
 Mahout in Action randomly sets aside some existing preference and
 calculate the difference between the predicted value and the real one. The
 first question I have is that in a user-based RS, if we choose a small
 number of neighbours, then it is quite possible that the prediction is not
 available at all. Here is an example:

 User 1 rated item 1, 2, 3, 4
 neighbour1 of user 1  rated item 1, 2
 neighbour2 of user 1  rated item 1, 3

 In the case above, the number of neighbours is two, so if we take out the
 rating of user 1 to item 4, there is no way to predict it. What will mahout
 deal with such a problem?

 Also, I am trying to map inferred preferences to a scale of 1-5. But the
 problem is that if I simply map all the preference to 1-2, then I will get a
 really nice evaluation result (almost 0), but you can easily see that this
 is not a right way to do it. So I guess the question is whether there is
 another way to evaluate the preference mapping algorithm.

 Any help will be highly appreciated.

 Best Regards,
 Jimmy

Re: Question about evaluating a Recommender System

2013-05-08 Thread Zhongduo Lin


Thank you for your reply.

I think the evaluation process involves randomly choosing the evaluation 
proportion. The problem is that I always get the best result when I set 
neighbors to 2, which seems unreasonable to me. Since there should be 
many test case that the recommender system couldn't predict at all. So 
why did I still get a valid result? How does Mahout handle this case?


Sorry I didn't make myself clear for the second question. Here is the 
problem: I have a set of inferred preference ranging from 0 to 1000. But 
I want to map it to 1 - 5. So there can be many ways for mapping. Let's 
take a simple example, if the mapping rule is like the following:

if (inferred_preference  995) preference = 1;
else preference = inferred_preference - 995.

You can see that this is a really bad mapping algorithms, but if we run 
the generated preference to Mahout, it is going to give me a really nice 
result because most of the preference is 1. So is there any other metric 
to evaluate this?


Any help will be highly appreciated.

Best Regards,
Jimmy


Zhongduo Lin (Jimmy)
MASc candidate in ECE department
University of Toronto

On 2013-05-08 4:44 AM, Sean Owen wrote:

It is true that a process based on user-user similarity only won't be
able to recommend item 4 in this example. This is a drawback of the
algorithm and not something that can be worked around. You could try
not to choose this item in the test set, but then that does not quite
reflect reality in the test.

If you just mean that compressing the range of pref values improves
RMSE in absolute terms, yes it does of course. But not in relative
terms. There is nothing inherently better or worse about a small range
in this example.

RMSE is a fine eval metric, but you can also considered mean average precision.

Sean

On Wed, May 8, 2013 at 4:29 AM, Zhongduo Lin zhong...@gmail.com wrote:

Hi All,

I am using the Mahout to build a user-based recommender system (RS). The
evaluation method I am using is
AverageAbsoluteDifferenceRecommenderEvaluator, which according to the
Mahout in Action randomly sets aside some existing preference and
calculate the difference between the predicted value and the real one. The
first question I have is that in a user-based RS, if we choose a small
number of neighbours, then it is quite possible that the prediction is not
available at all. Here is an example:

User 1 rated item 1, 2, 3, 4
neighbour1 of user 1  rated item 1, 2
neighbour2 of user 1  rated item 1, 3

In the case above, the number of neighbours is two, so if we take out the
rating of user 1 to item 4, there is no way to predict it. What will mahout
deal with such a problem?

Also, I am trying to map inferred preferences to a scale of 1-5. But the
problem is that if I simply map all the preference to 1-2, then I will get a
really nice evaluation result (almost 0), but you can easily see that this
is not a right way to do it. So I guess the question is whether there is
another way to evaluate the preference mapping algorithm.

Any help will be highly appreciated.

Best Regards,
Jimmy

Re: Question about evaluating a Recommender System

2013-05-08 Thread Sean Owen

It may be true that the results are best with a neighborhood size of
2. Why is that surprising? Very similar people, by nature, rate
similar things, which makes the things you held out of a user's test
set likely to be found in the recommendations.

The mapping you suggest is not that sensible, yes, since almost
everything maps to 1. Not surprisingly, most of your predictions are
near 1. That's better in an absolute sense, but RMSE is worse
relative to the variance of the data set. This is not a good mapping
-- or else, RMSE is not a very good metric, yes. So, don't do one of
those two things.

Try mean average precision for a metric that is not directly related
to the prediction values.

On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote:
 Thank you for your reply.

 I think the evaluation process involves randomly choosing the evaluation
 proportion. The problem is that I always get the best result when I set
 neighbors to 2, which seems unreasonable to me. Since there should be many
 test case that the recommender system couldn't predict at all. So why did I
 still get a valid result? How does Mahout handle this case?

 Sorry I didn't make myself clear for the second question. Here is the
 problem: I have a set of inferred preference ranging from 0 to 1000. But I
 want to map it to 1 - 5. So there can be many ways for mapping. Let's take a
 simple example, if the mapping rule is like the following:
 if (inferred_preference  995) preference = 1;
 else preference = inferred_preference - 995.

 You can see that this is a really bad mapping algorithms, but if we run the
 generated preference to Mahout, it is going to give me a really nice result
 because most of the preference is 1. So is there any other metric to
 evaluate this?


 Any help will be highly appreciated.

 Best Regards,
 Jimmy


 Zhongduo Lin (Jimmy)
 MASc candidate in ECE department
 University of Toronto


 On 2013-05-08 4:44 AM, Sean Owen wrote:

 It is true that a process based on user-user similarity only won't be
 able to recommend item 4 in this example. This is a drawback of the
 algorithm and not something that can be worked around. You could try
 not to choose this item in the test set, but then that does not quite
 reflect reality in the test.

 If you just mean that compressing the range of pref values improves
 RMSE in absolute terms, yes it does of course. But not in relative
 terms. There is nothing inherently better or worse about a small range
 in this example.

 RMSE is a fine eval metric, but you can also considered mean average
 precision.

 Sean

Re: Question about evaluating a Recommender System

2013-05-08 Thread Zhongduo Lin


Thank you for the quick response.

I agree that a neighborhood size of 2 will make the predictions more 
sensible. But my concern is that a neighborhood size of 2 can only 
predict a very small proportion of preference for each users. Let's take 
a look at the previous example,  how can it predict item 4 if item 4 
happens to be chosen as in the test set? I think this is quite common in 
my case as well as for Amazon or eBay, since the rating is very sparse. 
So I just don't know how it can still be run.


User 1rated item 1, 2, 3, 4
neighbour1 of user 1  rated item 1, 2
neighbour2 of user 1  rated item 1, 3


I wouldn't expect that the Root Mean Square error will have different 
performance than the Absolute difference, since in that case most of the 
predictions are close to 1, resulting a near zero error no matter I am 
using absolute difference or RMSE. How can I say RMSE is worse relative 
to the variance of the data set using Mahout? Unfortunately I got an 
error using the precision and recall evaluation method, I guess that's 
because the data are too sparse.


Best Regards,
Jimmy


On 13-05-08 10:05 AM, Sean Owen wrote:

It may be true that the results are best with a neighborhood size of
2. Why is that surprising? Very similar people, by nature, rate
similar things, which makes the things you held out of a user's test
set likely to be found in the recommendations.

The mapping you suggest is not that sensible, yes, since almost
everything maps to 1. Not surprisingly, most of your predictions are
near 1. That's better in an absolute sense, but RMSE is worse
relative to the variance of the data set. This is not a good mapping
-- or else, RMSE is not a very good metric, yes. So, don't do one of
those two things.

Try mean average precision for a metric that is not directly related
to the prediction values.

On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote:

Thank you for your reply.

I think the evaluation process involves randomly choosing the evaluation
proportion. The problem is that I always get the best result when I set
neighbors to 2, which seems unreasonable to me. Since there should be many
test case that the recommender system couldn't predict at all. So why did I
still get a valid result? How does Mahout handle this case?

Sorry I didn't make myself clear for the second question. Here is the
problem: I have a set of inferred preference ranging from 0 to 1000. But I
want to map it to 1 - 5. So there can be many ways for mapping. Let's take a
simple example, if the mapping rule is like the following:
 if (inferred_preference  995) preference = 1;
 else preference = inferred_preference - 995.

You can see that this is a really bad mapping algorithms, but if we run the
generated preference to Mahout, it is going to give me a really nice result
because most of the preference is 1. So is there any other metric to
evaluate this?


Any help will be highly appreciated.

Best Regards,
Jimmy


Zhongduo Lin (Jimmy)
MASc candidate in ECE department
University of Toronto


On 2013-05-08 4:44 AM, Sean Owen wrote:

It is true that a process based on user-user similarity only won't be
able to recommend item 4 in this example. This is a drawback of the
algorithm and not something that can be worked around. You could try
not to choose this item in the test set, but then that does not quite
reflect reality in the test.

If you just mean that compressing the range of pref values improves
RMSE in absolute terms, yes it does of course. But not in relative
terms. There is nothing inherently better or worse about a small range
in this example.

RMSE is a fine eval metric, but you can also considered mean average
precision.

Sean

Re: Question about evaluating a Recommender System

2013-05-08 Thread Sean Owen

You can't predict item 4 in that case. that shows the weakness of
neighborhood approaches for sparse data. That's pretty much the story
-- it's all working correctly. Maybe you should not use this approach.

On Wed, May 8, 2013 at 4:00 PM, Zhongduo Lin zhong...@gmail.com wrote:
 Thank you for the quick response.

 I agree that a neighborhood size of 2 will make the predictions more
 sensible. But my concern is that a neighborhood size of 2 can only predict a
 very small proportion of preference for each users. Let's take a look at the
 previous example,  how can it predict item 4 if item 4 happens to be chosen
 as in the test set? I think this is quite common in my case as well as for
 Amazon or eBay, since the rating is very sparse. So I just don't know how it
 can still be run.


 User 1rated item 1, 2, 3, 4
 neighbour1 of user 1  rated item 1, 2
 neighbour2 of user 1  rated item 1, 3


 I wouldn't expect that the Root Mean Square error will have different
 performance than the Absolute difference, since in that case most of the
 predictions are close to 1, resulting a near zero error no matter I am using
 absolute difference or RMSE. How can I say RMSE is worse relative to the
 variance of the data set using Mahout? Unfortunately I got an error using
 the precision and recall evaluation method, I guess that's because the data
 are too sparse.

 Best Regards,
 Jimmy



 On 13-05-08 10:05 AM, Sean Owen wrote:

 It may be true that the results are best with a neighborhood size of
 2. Why is that surprising? Very similar people, by nature, rate
 similar things, which makes the things you held out of a user's test
 set likely to be found in the recommendations.

 The mapping you suggest is not that sensible, yes, since almost
 everything maps to 1. Not surprisingly, most of your predictions are
 near 1. That's better in an absolute sense, but RMSE is worse
 relative to the variance of the data set. This is not a good mapping
 -- or else, RMSE is not a very good metric, yes. So, don't do one of
 those two things.

 Try mean average precision for a metric that is not directly related
 to the prediction values.

 On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote:

 Thank you for your reply.

 I think the evaluation process involves randomly choosing the evaluation
 proportion. The problem is that I always get the best result when I set
 neighbors to 2, which seems unreasonable to me. Since there should be
 many
 test case that the recommender system couldn't predict at all. So why did
 I
 still get a valid result? How does Mahout handle this case?

 Sorry I didn't make myself clear for the second question. Here is the
 problem: I have a set of inferred preference ranging from 0 to 1000. But
 I
 want to map it to 1 - 5. So there can be many ways for mapping. Let's
 take a
 simple example, if the mapping rule is like the following:
  if (inferred_preference  995) preference = 1;
  else preference = inferred_preference - 995.

 You can see that this is a really bad mapping algorithms, but if we run
 the
 generated preference to Mahout, it is going to give me a really nice
 result
 because most of the preference is 1. So is there any other metric to
 evaluate this?


 Any help will be highly appreciated.

 Best Regards,
 Jimmy


 Zhongduo Lin (Jimmy)
 MASc candidate in ECE department
 University of Toronto


 On 2013-05-08 4:44 AM, Sean Owen wrote:

 It is true that a process based on user-user similarity only won't be
 able to recommend item 4 in this example. This is a drawback of the
 algorithm and not something that can be worked around. You could try
 not to choose this item in the test set, but then that does not quite
 reflect reality in the test.

 If you just mean that compressing the range of pref values improves
 RMSE in absolute terms, yes it does of course. But not in relative
 terms. There is nothing inherently better or worse about a small range
 in this example.

 RMSE is a fine eval metric, but you can also considered mean average
 precision.

 Sean

Re: Question about evaluating a Recommender System

2013-05-08 Thread Zhongduo Lin

Thank you for your reply. So in the case that item 4 is in the test set, 
will Mahout just not take it into consideration or generate any 
preference instead? Any is it there any way to evaluate the mapping 
algorithm in Mahout?


Best Regards,
Jimmy

On 13-05-08 11:09 AM, Sean Owen wrote:

You can't predict item 4 in that case. that shows the weakness of
neighborhood approaches for sparse data. That's pretty much the story
-- it's all working correctly. Maybe you should not use this approach.

On Wed, May 8, 2013 at 4:00 PM, Zhongduo Lin zhong...@gmail.com wrote:

Thank you for the quick response.

I agree that a neighborhood size of 2 will make the predictions more
sensible. But my concern is that a neighborhood size of 2 can only predict a
very small proportion of preference for each users. Let's take a look at the
previous example,  how can it predict item 4 if item 4 happens to be chosen
as in the test set? I think this is quite common in my case as well as for
Amazon or eBay, since the rating is very sparse. So I just don't know how it
can still be run.


User 1rated item 1, 2, 3, 4
neighbour1 of user 1  rated item 1, 2
neighbour2 of user 1  rated item 1, 3


I wouldn't expect that the Root Mean Square error will have different
performance than the Absolute difference, since in that case most of the
predictions are close to 1, resulting a near zero error no matter I am using
absolute difference or RMSE. How can I say RMSE is worse relative to the
variance of the data set using Mahout? Unfortunately I got an error using
the precision and recall evaluation method, I guess that's because the data
are too sparse.

Best Regards,
Jimmy



On 13-05-08 10:05 AM, Sean Owen wrote:

It may be true that the results are best with a neighborhood size of
2. Why is that surprising? Very similar people, by nature, rate
similar things, which makes the things you held out of a user's test
set likely to be found in the recommendations.

The mapping you suggest is not that sensible, yes, since almost
everything maps to 1. Not surprisingly, most of your predictions are
near 1. That's better in an absolute sense, but RMSE is worse
relative to the variance of the data set. This is not a good mapping
-- or else, RMSE is not a very good metric, yes. So, don't do one of
those two things.

Try mean average precision for a metric that is not directly related
to the prediction values.

On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote:

Thank you for your reply.

I think the evaluation process involves randomly choosing the evaluation
proportion. The problem is that I always get the best result when I set
neighbors to 2, which seems unreasonable to me. Since there should be
many
test case that the recommender system couldn't predict at all. So why did
I
still get a valid result? How does Mahout handle this case?

Sorry I didn't make myself clear for the second question. Here is the
problem: I have a set of inferred preference ranging from 0 to 1000. But
I
want to map it to 1 - 5. So there can be many ways for mapping. Let's
take a
simple example, if the mapping rule is like the following:
  if (inferred_preference  995) preference = 1;
  else preference = inferred_preference - 995.

You can see that this is a really bad mapping algorithms, but if we run
the
generated preference to Mahout, it is going to give me a really nice
result
because most of the preference is 1. So is there any other metric to
evaluate this?


Any help will be highly appreciated.

Best Regards,
Jimmy


Zhongduo Lin (Jimmy)
MASc candidate in ECE department
University of Toronto


On 2013-05-08 4:44 AM, Sean Owen wrote:

It is true that a process based on user-user similarity only won't be
able to recommend item 4 in this example. This is a drawback of the
algorithm and not something that can be worked around. You could try
not to choose this item in the test set, but then that does not quite
reflect reality in the test.

If you just mean that compressing the range of pref values improves
RMSE in absolute terms, yes it does of course. But not in relative
terms. There is nothing inherently better or worse about a small range
in this example.

RMSE is a fine eval metric, but you can also considered mean average
precision.

Sean

Re: Question about evaluating a Recommender System

2013-05-08 Thread Sean Owen

It may be selected as a test item. Other algorithms can predict the
'4'. The test process is random so as to not favor one algorithm.
I think you are just arguing that the algorithm you are using isn't
good for your data -- so just don't use it. Is that not the answer?
I don't know what you mean by the mapping algorithm.

On Wed, May 8, 2013 at 4:17 PM, Zhongduo Lin zhong...@gmail.com wrote:
 Thank you for your reply. So in the case that item 4 is in the test set,
 will Mahout just not take it into consideration or generate any preference
 instead? Any is it there any way to evaluate the mapping algorithm in
 Mahout?

 Best Regards,
 Jimmy

Re: Question about evaluating a Recommender System

2013-05-08 Thread Zhongduo Lin

Sorry for the confusion. I am comparing different algorithms including 
both user-based and item-based. So I think it will be useful to know how 
Mahout is dealing with such a situation in order to give a more fair 
comparison. Because for now, the user-based approaches get a better 
result to me. By mapping algorithm, I mean a way to map my inferred 
preference (1-1000) to a smaller scale (1-5). Thank you for your help.


Best Regards,
Jimmy


On 13-05-08 11:21 AM, Sean Owen wrote:

It may be selected as a test item. Other algorithms can predict the
'4'. The test process is random so as to not favor one algorithm.
I think you are just arguing that the algorithm you are using isn't
good for your data -- so just don't use it. Is that not the answer?
I don't know what you mean by the mapping algorithm.

On Wed, May 8, 2013 at 4:17 PM, Zhongduo Lin zhong...@gmail.com wrote:

Thank you for your reply. So in the case that item 4 is in the test set,
will Mahout just not take it into consideration or generate any preference
instead? Any is it there any way to evaluate the mapping algorithm in
Mahout?

Best Regards,
Jimmy

Re: Question about evaluating a Recommender System

2013-05-08 Thread Alejandro Bellogin Kouki

AFAIK, the recommender would predict a NaN, which will be ignored by the 
evaluator.


However, I am not sure if there is any way to know how many of these 
were actually produced in the evaluation step, that is, something like 
the count of predictions with a NaN value.


Cheers,
Alex

Zhongduo Lin escribió:

Thank you for the quick response.

I agree that a neighborhood size of 2 will make the predictions more 
sensible. But my concern is that a neighborhood size of 2 can only 
predict a very small proportion of preference for each users. Let's 
take a look at the previous example,  how can it predict item 4 if 
item 4 happens to be chosen as in the test set? I think this is quite 
common in my case as well as for Amazon or eBay, since the rating is 
very sparse. So I just don't know how it can still be run.


User 1rated item 1, 2, 3, 4
neighbour1 of user 1  rated item 1, 2
neighbour2 of user 1  rated item 1, 3


I wouldn't expect that the Root Mean Square error will have different 
performance than the Absolute difference, since in that case most of 
the predictions are close to 1, resulting a near zero error no matter 
I am using absolute difference or RMSE. How can I say RMSE is worse 
relative to the variance of the data set using Mahout? Unfortunately 
I got an error using the precision and recall evaluation method, I 
guess that's because the data are too sparse.


Best Regards,
Jimmy


On 13-05-08 10:05 AM, Sean Owen wrote:

It may be true that the results are best with a neighborhood size of
2. Why is that surprising? Very similar people, by nature, rate
similar things, which makes the things you held out of a user's test
set likely to be found in the recommendations.

The mapping you suggest is not that sensible, yes, since almost
everything maps to 1. Not surprisingly, most of your predictions are
near 1. That's better in an absolute sense, but RMSE is worse
relative to the variance of the data set. This is not a good mapping
-- or else, RMSE is not a very good metric, yes. So, don't do one of
those two things.

Try mean average precision for a metric that is not directly related
to the prediction values.

On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote:

Thank you for your reply.

I think the evaluation process involves randomly choosing the 
evaluation

proportion. The problem is that I always get the best result when I set
neighbors to 2, which seems unreasonable to me. Since there should 
be many
test case that the recommender system couldn't predict at all. So 
why did I

still get a valid result? How does Mahout handle this case?

Sorry I didn't make myself clear for the second question. Here is the
problem: I have a set of inferred preference ranging from 0 to 1000. 
But I
want to map it to 1 - 5. So there can be many ways for mapping. 
Let's take a

simple example, if the mapping rule is like the following:
 if (inferred_preference  995) preference = 1;
 else preference = inferred_preference - 995.

You can see that this is a really bad mapping algorithms, but if we 
run the
generated preference to Mahout, it is going to give me a really nice 
result

because most of the preference is 1. So is there any other metric to
evaluate this?


Any help will be highly appreciated.

Best Regards,
Jimmy


Zhongduo Lin (Jimmy)
MASc candidate in ECE department
University of Toronto


On 2013-05-08 4:44 AM, Sean Owen wrote:

It is true that a process based on user-user similarity only won't be
able to recommend item 4 in this example. This is a drawback of the
algorithm and not something that can be worked around. You could try
not to choose this item in the test set, but then that does not quite
reflect reality in the test.

If you just mean that compressing the range of pref values improves
RMSE in absolute terms, yes it does of course. But not in relative
terms. There is nothing inherently better or worse about a small range
in this example.

RMSE is a fine eval metric, but you can also considered mean average
precision.

Sean





--
 Alejandro Bellogin Kouki
 http://rincon.uam.es/dir?cw=435275268554687

Re: Question about evaluating a Recommender System

2013-05-08 Thread Zhongduo Lin

This accounts for why a neighborhood size of 2 always gives me the best 
result. Thank you!


Best Regards,
Jimmy

Zhongduo Lin (Jimmy)
MASc candidate in ECE department
University of Toronto

On 2013-05-08 2:40 PM, Alejandro Bellogin Kouki wrote:

AFAIK, the recommender would predict a NaN, which will be ignored by the
evaluator.

However, I am not sure if there is any way to know how many of these
were actually produced in the evaluation step, that is, something like
the count of predictions with a NaN value.

Cheers,
Alex

Zhongduo Lin escribió:

Thank you for the quick response.

I agree that a neighborhood size of 2 will make the predictions more
sensible. But my concern is that a neighborhood size of 2 can only
predict a very small proportion of preference for each users. Let's
take a look at the previous example,  how can it predict item 4 if
item 4 happens to be chosen as in the test set? I think this is quite
common in my case as well as for Amazon or eBay, since the rating is
very sparse. So I just don't know how it can still be run.

User 1rated item 1, 2, 3, 4
neighbour1 of user 1  rated item 1, 2
neighbour2 of user 1  rated item 1, 3


I wouldn't expect that the Root Mean Square error will have different
performance than the Absolute difference, since in that case most of
the predictions are close to 1, resulting a near zero error no matter
I am using absolute difference or RMSE. How can I say RMSE is worse
relative to the variance of the data set using Mahout? Unfortunately
I got an error using the precision and recall evaluation method, I
guess that's because the data are too sparse.

Best Regards,
Jimmy


On 13-05-08 10:05 AM, Sean Owen wrote:

It may be true that the results are best with a neighborhood size of
2. Why is that surprising? Very similar people, by nature, rate
similar things, which makes the things you held out of a user's test
set likely to be found in the recommendations.

The mapping you suggest is not that sensible, yes, since almost
everything maps to 1. Not surprisingly, most of your predictions are
near 1. That's better in an absolute sense, but RMSE is worse
relative to the variance of the data set. This is not a good mapping
-- or else, RMSE is not a very good metric, yes. So, don't do one of
those two things.

Try mean average precision for a metric that is not directly related
to the prediction values.

On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote:

Thank you for your reply.

I think the evaluation process involves randomly choosing the
evaluation
proportion. The problem is that I always get the best result when I set
neighbors to 2, which seems unreasonable to me. Since there should
be many
test case that the recommender system couldn't predict at all. So
why did I
still get a valid result? How does Mahout handle this case?

Sorry I didn't make myself clear for the second question. Here is the
problem: I have a set of inferred preference ranging from 0 to 1000.
But I
want to map it to 1 - 5. So there can be many ways for mapping.
Let's take a
simple example, if the mapping rule is like the following:
 if (inferred_preference  995) preference = 1;
 else preference = inferred_preference - 995.

You can see that this is a really bad mapping algorithms, but if we
run the
generated preference to Mahout, it is going to give me a really nice
result
because most of the preference is 1. So is there any other metric to
evaluate this?


Any help will be highly appreciated.

Best Regards,
Jimmy


Zhongduo Lin (Jimmy)
MASc candidate in ECE department
University of Toronto


On 2013-05-08 4:44 AM, Sean Owen wrote:

It is true that a process based on user-user similarity only won't be
able to recommend item 4 in this example. This is a drawback of the
algorithm and not something that can be worked around. You could try
not to choose this item in the test set, but then that does not quite
reflect reality in the test.

If you just mean that compressing the range of pref values improves
RMSE in absolute terms, yes it does of course. But not in relative
terms. There is nothing inherently better or worse about a small range
in this example.

RMSE is a fine eval metric, but you can also considered mean average
precision.

Sean

Re: Question about evaluating a Recommender System

2013-05-08 Thread Sean Owen

Ah, yes that's right. Yes if you have a lot of these values, the test
is really not valid. It may look 'better' but isn't for just this
reason. You want to make sure the result doesn't have many of these or
else you would discard it. Look for log lines like Unable to
recommend in X cases

On Wed, May 8, 2013 at 8:00 PM, Zhongduo Lin zhong...@gmail.com wrote:
 This accounts for why a neighborhood size of 2 always gives me the best
 result. Thank you!


 Best Regards,
 Jimmy

 Zhongduo Lin (Jimmy)
 MASc candidate in ECE department
 University of Toronto

 On 2013-05-08 2:40 PM, Alejandro Bellogin Kouki wrote:

 AFAIK, the recommender would predict a NaN, which will be ignored by the
 evaluator.

 However, I am not sure if there is any way to know how many of these
 were actually produced in the evaluation step, that is, something like
 the count of predictions with a NaN value.

 Cheers,
 Alex

 Zhongduo Lin escribió:

 Thank you for the quick response.

 I agree that a neighborhood size of 2 will make the predictions more
 sensible. But my concern is that a neighborhood size of 2 can only
 predict a very small proportion of preference for each users. Let's
 take a look at the previous example,  how can it predict item 4 if
 item 4 happens to be chosen as in the test set? I think this is quite
 common in my case as well as for Amazon or eBay, since the rating is
 very sparse. So I just don't know how it can still be run.

 User 1rated item 1, 2, 3, 4
 neighbour1 of user 1  rated item 1, 2
 neighbour2 of user 1  rated item 1, 3


 I wouldn't expect that the Root Mean Square error will have different
 performance than the Absolute difference, since in that case most of
 the predictions are close to 1, resulting a near zero error no matter
 I am using absolute difference or RMSE. How can I say RMSE is worse
 relative to the variance of the data set using Mahout? Unfortunately
 I got an error using the precision and recall evaluation method, I
 guess that's because the data are too sparse.

 Best Regards,
 Jimmy


 On 13-05-08 10:05 AM, Sean Owen wrote:

 It may be true that the results are best with a neighborhood size of
 2. Why is that surprising? Very similar people, by nature, rate
 similar things, which makes the things you held out of a user's test
 set likely to be found in the recommendations.

 The mapping you suggest is not that sensible, yes, since almost
 everything maps to 1. Not surprisingly, most of your predictions are
 near 1. That's better in an absolute sense, but RMSE is worse
 relative to the variance of the data set. This is not a good mapping
 -- or else, RMSE is not a very good metric, yes. So, don't do one of
 those two things.

 Try mean average precision for a metric that is not directly related
 to the prediction values.

 On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote:

 Thank you for your reply.

 I think the evaluation process involves randomly choosing the
 evaluation
 proportion. The problem is that I always get the best result when I set
 neighbors to 2, which seems unreasonable to me. Since there should
 be many
 test case that the recommender system couldn't predict at all. So
 why did I
 still get a valid result? How does Mahout handle this case?

 Sorry I didn't make myself clear for the second question. Here is the
 problem: I have a set of inferred preference ranging from 0 to 1000.
 But I
 want to map it to 1 - 5. So there can be many ways for mapping.
 Let's take a
 simple example, if the mapping rule is like the following:
  if (inferred_preference  995) preference = 1;
  else preference = inferred_preference - 995.

 You can see that this is a really bad mapping algorithms, but if we
 run the
 generated preference to Mahout, it is going to give me a really nice
 result
 because most of the preference is 1. So is there any other metric to
 evaluate this?


 Any help will be highly appreciated.

 Best Regards,
 Jimmy


 Zhongduo Lin (Jimmy)
 MASc candidate in ECE department
 University of Toronto


 On 2013-05-08 4:44 AM, Sean Owen wrote:

 It is true that a process based on user-user similarity only won't be
 able to recommend item 4 in this example. This is a drawback of the
 algorithm and not something that can be worked around. You could try
 not to choose this item in the test set, but then that does not quite
 reflect reality in the test.

 If you just mean that compressing the range of pref values improves
 RMSE in absolute terms, yes it does of course. But not in relative
 terms. There is nothing inherently better or worse about a small range
 in this example.

 RMSE is a fine eval metric, but you can also considered mean average
 precision.

 Sean

Re: Question about evaluating a Recommender System

2013-05-08 Thread Zhongduo Lin

I see. Thank you for your information! Any idea about evaluating the 
method of mapping inferred preference to a smaller scale with Mahout?


Best Regards,
Jimmy

Zhongduo Lin (Jimmy)
MASc candidate in ECE department
University of Toronto

On 2013-05-08 3:32 PM, Sean Owen wrote:

Ah, yes that's right. Yes if you have a lot of these values, the test
is really not valid. It may look 'better' but isn't for just this
reason. You want to make sure the result doesn't have many of these or
else you would discard it. Look for log lines like Unable to
recommend in X cases

On Wed, May 8, 2013 at 8:00 PM, Zhongduo Lin zhong...@gmail.com wrote:

This accounts for why a neighborhood size of 2 always gives me the best
result. Thank you!


Best Regards,
Jimmy

Zhongduo Lin (Jimmy)
MASc candidate in ECE department
University of Toronto

On 2013-05-08 2:40 PM, Alejandro Bellogin Kouki wrote:


AFAIK, the recommender would predict a NaN, which will be ignored by the
evaluator.

However, I am not sure if there is any way to know how many of these
were actually produced in the evaluation step, that is, something like
the count of predictions with a NaN value.

Cheers,
Alex

Zhongduo Lin escribió:


Thank you for the quick response.

I agree that a neighborhood size of 2 will make the predictions more
sensible. But my concern is that a neighborhood size of 2 can only
predict a very small proportion of preference for each users. Let's
take a look at the previous example,  how can it predict item 4 if
item 4 happens to be chosen as in the test set? I think this is quite
common in my case as well as for Amazon or eBay, since the rating is
very sparse. So I just don't know how it can still be run.

User 1rated item 1, 2, 3, 4
neighbour1 of user 1  rated item 1, 2
neighbour2 of user 1  rated item 1, 3


I wouldn't expect that the Root Mean Square error will have different
performance than the Absolute difference, since in that case most of
the predictions are close to 1, resulting a near zero error no matter
I am using absolute difference or RMSE. How can I say RMSE is worse
relative to the variance of the data set using Mahout? Unfortunately
I got an error using the precision and recall evaluation method, I
guess that's because the data are too sparse.

Best Regards,
Jimmy


On 13-05-08 10:05 AM, Sean Owen wrote:


It may be true that the results are best with a neighborhood size of
2. Why is that surprising? Very similar people, by nature, rate
similar things, which makes the things you held out of a user's test
set likely to be found in the recommendations.

The mapping you suggest is not that sensible, yes, since almost
everything maps to 1. Not surprisingly, most of your predictions are
near 1. That's better in an absolute sense, but RMSE is worse
relative to the variance of the data set. This is not a good mapping
-- or else, RMSE is not a very good metric, yes. So, don't do one of
those two things.

Try mean average precision for a metric that is not directly related
to the prediction values.

On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote:


Thank you for your reply.

I think the evaluation process involves randomly choosing the
evaluation
proportion. The problem is that I always get the best result when I set
neighbors to 2, which seems unreasonable to me. Since there should
be many
test case that the recommender system couldn't predict at all. So
why did I
still get a valid result? How does Mahout handle this case?

Sorry I didn't make myself clear for the second question. Here is the
problem: I have a set of inferred preference ranging from 0 to 1000.
But I
want to map it to 1 - 5. So there can be many ways for mapping.
Let's take a
simple example, if the mapping rule is like the following:
  if (inferred_preference  995) preference = 1;
  else preference = inferred_preference - 995.

You can see that this is a really bad mapping algorithms, but if we
run the
generated preference to Mahout, it is going to give me a really nice
result
because most of the preference is 1. So is there any other metric to
evaluate this?


Any help will be highly appreciated.

Best Regards,
Jimmy


Zhongduo Lin (Jimmy)
MASc candidate in ECE department
University of Toronto


On 2013-05-08 4:44 AM, Sean Owen wrote:


It is true that a process based on user-user similarity only won't be
able to recommend item 4 in this example. This is a drawback of the
algorithm and not something that can be worked around. You could try
not to choose this item in the test set, but then that does not quite
reflect reality in the test.

If you just mean that compressing the range of pref values improves
RMSE in absolute terms, yes it does of course. But not in relative
terms. There is nothing inherently better or worse about a small range
in this example.

RMSE is a fine eval metric, but you can also considered mean average
precision.

Sean

Question about evaluating a Recommender System

2013-05-07 Thread Zhongduo Lin


Hi All,

I am using the Mahout to build a user-based recommender system (RS). The 
evaluation method I am using is 
AverageAbsoluteDifferenceRecommenderEvaluator, which according to the 
Mahout in Action randomly sets aside some existing preference and 
calculate the difference between the predicted value and the real one. 
The first question I have is that in a user-based RS, if we choose a 
small  number of neighbours, then it is quite possible that the 
prediction is not available at all. Here is an example:


User 1 rated item 1, 2, 3, 4
neighbour1 of user 1  rated item 1, 2
neighbour2 of user 1  rated item 1, 3

In the case above, the number of neighbours is two, so if we take out 
the rating of user 1 to item 4, there is no way to predict it. What will 
mahout deal with such a problem?


Also, I am trying to map inferred preferences to a scale of 1-5. But the 
problem is that if I simply map all the preference to 1-2, then I will 
get a really nice evaluation result (almost 0), but you can easily see 
that this is not a right way to do it. So I guess the question is 
whether there is another way to evaluate the preference mapping algorithm.


Any help will be highly appreciated.

Best Regards,
Jimmy

Re: Question about evaluating a Recommender System

Re: Question about evaluating a Recommender System

Re: Question about evaluating a Recommender System

Re: Question about evaluating a Recommender System

Re: Question about evaluating a Recommender System

Re: Question about evaluating a Recommender System

Re: Question about evaluating a Recommender System

Re: Question about evaluating a Recommender System

Re: Question about evaluating a Recommender System

Re: Question about evaluating a Recommender System

Re: Question about evaluating a Recommender System

Re: Question about evaluating a Recommender System

Question about evaluating a Recommender System

13 matches

Site Navigation

Mail list logo

Footer information