Re: Question about evaluating a Recommender System
It is true that a process based on user-user similarity only won't be able to recommend item 4 in this example. This is a drawback of the algorithm and not something that can be worked around. You could try not to choose this item in the test set, but then that does not quite reflect reality in the test. If you just mean that compressing the range of pref values improves RMSE in absolute terms, yes it does of course. But not in relative terms. There is nothing inherently better or worse about a small range in this example. RMSE is a fine eval metric, but you can also considered mean average precision. Sean On Wed, May 8, 2013 at 4:29 AM, Zhongduo Lin zhong...@gmail.com wrote: Hi All, I am using the Mahout to build a user-based recommender system (RS). The evaluation method I am using is AverageAbsoluteDifferenceRecommenderEvaluator, which according to the Mahout in Action randomly sets aside some existing preference and calculate the difference between the predicted value and the real one. The first question I have is that in a user-based RS, if we choose a small number of neighbours, then it is quite possible that the prediction is not available at all. Here is an example: User 1 rated item 1, 2, 3, 4 neighbour1 of user 1 rated item 1, 2 neighbour2 of user 1 rated item 1, 3 In the case above, the number of neighbours is two, so if we take out the rating of user 1 to item 4, there is no way to predict it. What will mahout deal with such a problem? Also, I am trying to map inferred preferences to a scale of 1-5. But the problem is that if I simply map all the preference to 1-2, then I will get a really nice evaluation result (almost 0), but you can easily see that this is not a right way to do it. So I guess the question is whether there is another way to evaluate the preference mapping algorithm. Any help will be highly appreciated. Best Regards, Jimmy
Re: Question about evaluating a Recommender System
Thank you for your reply. I think the evaluation process involves randomly choosing the evaluation proportion. The problem is that I always get the best result when I set neighbors to 2, which seems unreasonable to me. Since there should be many test case that the recommender system couldn't predict at all. So why did I still get a valid result? How does Mahout handle this case? Sorry I didn't make myself clear for the second question. Here is the problem: I have a set of inferred preference ranging from 0 to 1000. But I want to map it to 1 - 5. So there can be many ways for mapping. Let's take a simple example, if the mapping rule is like the following: if (inferred_preference 995) preference = 1; else preference = inferred_preference - 995. You can see that this is a really bad mapping algorithms, but if we run the generated preference to Mahout, it is going to give me a really nice result because most of the preference is 1. So is there any other metric to evaluate this? Any help will be highly appreciated. Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 4:44 AM, Sean Owen wrote: It is true that a process based on user-user similarity only won't be able to recommend item 4 in this example. This is a drawback of the algorithm and not something that can be worked around. You could try not to choose this item in the test set, but then that does not quite reflect reality in the test. If you just mean that compressing the range of pref values improves RMSE in absolute terms, yes it does of course. But not in relative terms. There is nothing inherently better or worse about a small range in this example. RMSE is a fine eval metric, but you can also considered mean average precision. Sean On Wed, May 8, 2013 at 4:29 AM, Zhongduo Lin zhong...@gmail.com wrote: Hi All, I am using the Mahout to build a user-based recommender system (RS). The evaluation method I am using is AverageAbsoluteDifferenceRecommenderEvaluator, which according to the Mahout in Action randomly sets aside some existing preference and calculate the difference between the predicted value and the real one. The first question I have is that in a user-based RS, if we choose a small number of neighbours, then it is quite possible that the prediction is not available at all. Here is an example: User 1 rated item 1, 2, 3, 4 neighbour1 of user 1 rated item 1, 2 neighbour2 of user 1 rated item 1, 3 In the case above, the number of neighbours is two, so if we take out the rating of user 1 to item 4, there is no way to predict it. What will mahout deal with such a problem? Also, I am trying to map inferred preferences to a scale of 1-5. But the problem is that if I simply map all the preference to 1-2, then I will get a really nice evaluation result (almost 0), but you can easily see that this is not a right way to do it. So I guess the question is whether there is another way to evaluate the preference mapping algorithm. Any help will be highly appreciated. Best Regards, Jimmy
Re: Question about evaluating a Recommender System
It may be true that the results are best with a neighborhood size of 2. Why is that surprising? Very similar people, by nature, rate similar things, which makes the things you held out of a user's test set likely to be found in the recommendations. The mapping you suggest is not that sensible, yes, since almost everything maps to 1. Not surprisingly, most of your predictions are near 1. That's better in an absolute sense, but RMSE is worse relative to the variance of the data set. This is not a good mapping -- or else, RMSE is not a very good metric, yes. So, don't do one of those two things. Try mean average precision for a metric that is not directly related to the prediction values. On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote: Thank you for your reply. I think the evaluation process involves randomly choosing the evaluation proportion. The problem is that I always get the best result when I set neighbors to 2, which seems unreasonable to me. Since there should be many test case that the recommender system couldn't predict at all. So why did I still get a valid result? How does Mahout handle this case? Sorry I didn't make myself clear for the second question. Here is the problem: I have a set of inferred preference ranging from 0 to 1000. But I want to map it to 1 - 5. So there can be many ways for mapping. Let's take a simple example, if the mapping rule is like the following: if (inferred_preference 995) preference = 1; else preference = inferred_preference - 995. You can see that this is a really bad mapping algorithms, but if we run the generated preference to Mahout, it is going to give me a really nice result because most of the preference is 1. So is there any other metric to evaluate this? Any help will be highly appreciated. Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 4:44 AM, Sean Owen wrote: It is true that a process based on user-user similarity only won't be able to recommend item 4 in this example. This is a drawback of the algorithm and not something that can be worked around. You could try not to choose this item in the test set, but then that does not quite reflect reality in the test. If you just mean that compressing the range of pref values improves RMSE in absolute terms, yes it does of course. But not in relative terms. There is nothing inherently better or worse about a small range in this example. RMSE is a fine eval metric, but you can also considered mean average precision. Sean
Re: Question about evaluating a Recommender System
Thank you for the quick response. I agree that a neighborhood size of 2 will make the predictions more sensible. But my concern is that a neighborhood size of 2 can only predict a very small proportion of preference for each users. Let's take a look at the previous example, how can it predict item 4 if item 4 happens to be chosen as in the test set? I think this is quite common in my case as well as for Amazon or eBay, since the rating is very sparse. So I just don't know how it can still be run. User 1rated item 1, 2, 3, 4 neighbour1 of user 1 rated item 1, 2 neighbour2 of user 1 rated item 1, 3 I wouldn't expect that the Root Mean Square error will have different performance than the Absolute difference, since in that case most of the predictions are close to 1, resulting a near zero error no matter I am using absolute difference or RMSE. How can I say RMSE is worse relative to the variance of the data set using Mahout? Unfortunately I got an error using the precision and recall evaluation method, I guess that's because the data are too sparse. Best Regards, Jimmy On 13-05-08 10:05 AM, Sean Owen wrote: It may be true that the results are best with a neighborhood size of 2. Why is that surprising? Very similar people, by nature, rate similar things, which makes the things you held out of a user's test set likely to be found in the recommendations. The mapping you suggest is not that sensible, yes, since almost everything maps to 1. Not surprisingly, most of your predictions are near 1. That's better in an absolute sense, but RMSE is worse relative to the variance of the data set. This is not a good mapping -- or else, RMSE is not a very good metric, yes. So, don't do one of those two things. Try mean average precision for a metric that is not directly related to the prediction values. On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote: Thank you for your reply. I think the evaluation process involves randomly choosing the evaluation proportion. The problem is that I always get the best result when I set neighbors to 2, which seems unreasonable to me. Since there should be many test case that the recommender system couldn't predict at all. So why did I still get a valid result? How does Mahout handle this case? Sorry I didn't make myself clear for the second question. Here is the problem: I have a set of inferred preference ranging from 0 to 1000. But I want to map it to 1 - 5. So there can be many ways for mapping. Let's take a simple example, if the mapping rule is like the following: if (inferred_preference 995) preference = 1; else preference = inferred_preference - 995. You can see that this is a really bad mapping algorithms, but if we run the generated preference to Mahout, it is going to give me a really nice result because most of the preference is 1. So is there any other metric to evaluate this? Any help will be highly appreciated. Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 4:44 AM, Sean Owen wrote: It is true that a process based on user-user similarity only won't be able to recommend item 4 in this example. This is a drawback of the algorithm and not something that can be worked around. You could try not to choose this item in the test set, but then that does not quite reflect reality in the test. If you just mean that compressing the range of pref values improves RMSE in absolute terms, yes it does of course. But not in relative terms. There is nothing inherently better or worse about a small range in this example. RMSE is a fine eval metric, but you can also considered mean average precision. Sean
Re: Question about evaluating a Recommender System
You can't predict item 4 in that case. that shows the weakness of neighborhood approaches for sparse data. That's pretty much the story -- it's all working correctly. Maybe you should not use this approach. On Wed, May 8, 2013 at 4:00 PM, Zhongduo Lin zhong...@gmail.com wrote: Thank you for the quick response. I agree that a neighborhood size of 2 will make the predictions more sensible. But my concern is that a neighborhood size of 2 can only predict a very small proportion of preference for each users. Let's take a look at the previous example, how can it predict item 4 if item 4 happens to be chosen as in the test set? I think this is quite common in my case as well as for Amazon or eBay, since the rating is very sparse. So I just don't know how it can still be run. User 1rated item 1, 2, 3, 4 neighbour1 of user 1 rated item 1, 2 neighbour2 of user 1 rated item 1, 3 I wouldn't expect that the Root Mean Square error will have different performance than the Absolute difference, since in that case most of the predictions are close to 1, resulting a near zero error no matter I am using absolute difference or RMSE. How can I say RMSE is worse relative to the variance of the data set using Mahout? Unfortunately I got an error using the precision and recall evaluation method, I guess that's because the data are too sparse. Best Regards, Jimmy On 13-05-08 10:05 AM, Sean Owen wrote: It may be true that the results are best with a neighborhood size of 2. Why is that surprising? Very similar people, by nature, rate similar things, which makes the things you held out of a user's test set likely to be found in the recommendations. The mapping you suggest is not that sensible, yes, since almost everything maps to 1. Not surprisingly, most of your predictions are near 1. That's better in an absolute sense, but RMSE is worse relative to the variance of the data set. This is not a good mapping -- or else, RMSE is not a very good metric, yes. So, don't do one of those two things. Try mean average precision for a metric that is not directly related to the prediction values. On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote: Thank you for your reply. I think the evaluation process involves randomly choosing the evaluation proportion. The problem is that I always get the best result when I set neighbors to 2, which seems unreasonable to me. Since there should be many test case that the recommender system couldn't predict at all. So why did I still get a valid result? How does Mahout handle this case? Sorry I didn't make myself clear for the second question. Here is the problem: I have a set of inferred preference ranging from 0 to 1000. But I want to map it to 1 - 5. So there can be many ways for mapping. Let's take a simple example, if the mapping rule is like the following: if (inferred_preference 995) preference = 1; else preference = inferred_preference - 995. You can see that this is a really bad mapping algorithms, but if we run the generated preference to Mahout, it is going to give me a really nice result because most of the preference is 1. So is there any other metric to evaluate this? Any help will be highly appreciated. Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 4:44 AM, Sean Owen wrote: It is true that a process based on user-user similarity only won't be able to recommend item 4 in this example. This is a drawback of the algorithm and not something that can be worked around. You could try not to choose this item in the test set, but then that does not quite reflect reality in the test. If you just mean that compressing the range of pref values improves RMSE in absolute terms, yes it does of course. But not in relative terms. There is nothing inherently better or worse about a small range in this example. RMSE is a fine eval metric, but you can also considered mean average precision. Sean
Re: Question about evaluating a Recommender System
Thank you for your reply. So in the case that item 4 is in the test set, will Mahout just not take it into consideration or generate any preference instead? Any is it there any way to evaluate the mapping algorithm in Mahout? Best Regards, Jimmy On 13-05-08 11:09 AM, Sean Owen wrote: You can't predict item 4 in that case. that shows the weakness of neighborhood approaches for sparse data. That's pretty much the story -- it's all working correctly. Maybe you should not use this approach. On Wed, May 8, 2013 at 4:00 PM, Zhongduo Lin zhong...@gmail.com wrote: Thank you for the quick response. I agree that a neighborhood size of 2 will make the predictions more sensible. But my concern is that a neighborhood size of 2 can only predict a very small proportion of preference for each users. Let's take a look at the previous example, how can it predict item 4 if item 4 happens to be chosen as in the test set? I think this is quite common in my case as well as for Amazon or eBay, since the rating is very sparse. So I just don't know how it can still be run. User 1rated item 1, 2, 3, 4 neighbour1 of user 1 rated item 1, 2 neighbour2 of user 1 rated item 1, 3 I wouldn't expect that the Root Mean Square error will have different performance than the Absolute difference, since in that case most of the predictions are close to 1, resulting a near zero error no matter I am using absolute difference or RMSE. How can I say RMSE is worse relative to the variance of the data set using Mahout? Unfortunately I got an error using the precision and recall evaluation method, I guess that's because the data are too sparse. Best Regards, Jimmy On 13-05-08 10:05 AM, Sean Owen wrote: It may be true that the results are best with a neighborhood size of 2. Why is that surprising? Very similar people, by nature, rate similar things, which makes the things you held out of a user's test set likely to be found in the recommendations. The mapping you suggest is not that sensible, yes, since almost everything maps to 1. Not surprisingly, most of your predictions are near 1. That's better in an absolute sense, but RMSE is worse relative to the variance of the data set. This is not a good mapping -- or else, RMSE is not a very good metric, yes. So, don't do one of those two things. Try mean average precision for a metric that is not directly related to the prediction values. On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote: Thank you for your reply. I think the evaluation process involves randomly choosing the evaluation proportion. The problem is that I always get the best result when I set neighbors to 2, which seems unreasonable to me. Since there should be many test case that the recommender system couldn't predict at all. So why did I still get a valid result? How does Mahout handle this case? Sorry I didn't make myself clear for the second question. Here is the problem: I have a set of inferred preference ranging from 0 to 1000. But I want to map it to 1 - 5. So there can be many ways for mapping. Let's take a simple example, if the mapping rule is like the following: if (inferred_preference 995) preference = 1; else preference = inferred_preference - 995. You can see that this is a really bad mapping algorithms, but if we run the generated preference to Mahout, it is going to give me a really nice result because most of the preference is 1. So is there any other metric to evaluate this? Any help will be highly appreciated. Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 4:44 AM, Sean Owen wrote: It is true that a process based on user-user similarity only won't be able to recommend item 4 in this example. This is a drawback of the algorithm and not something that can be worked around. You could try not to choose this item in the test set, but then that does not quite reflect reality in the test. If you just mean that compressing the range of pref values improves RMSE in absolute terms, yes it does of course. But not in relative terms. There is nothing inherently better or worse about a small range in this example. RMSE is a fine eval metric, but you can also considered mean average precision. Sean
Re: Question about evaluating a Recommender System
It may be selected as a test item. Other algorithms can predict the '4'. The test process is random so as to not favor one algorithm. I think you are just arguing that the algorithm you are using isn't good for your data -- so just don't use it. Is that not the answer? I don't know what you mean by the mapping algorithm. On Wed, May 8, 2013 at 4:17 PM, Zhongduo Lin zhong...@gmail.com wrote: Thank you for your reply. So in the case that item 4 is in the test set, will Mahout just not take it into consideration or generate any preference instead? Any is it there any way to evaluate the mapping algorithm in Mahout? Best Regards, Jimmy
Re: Question about evaluating a Recommender System
Sorry for the confusion. I am comparing different algorithms including both user-based and item-based. So I think it will be useful to know how Mahout is dealing with such a situation in order to give a more fair comparison. Because for now, the user-based approaches get a better result to me. By mapping algorithm, I mean a way to map my inferred preference (1-1000) to a smaller scale (1-5). Thank you for your help. Best Regards, Jimmy On 13-05-08 11:21 AM, Sean Owen wrote: It may be selected as a test item. Other algorithms can predict the '4'. The test process is random so as to not favor one algorithm. I think you are just arguing that the algorithm you are using isn't good for your data -- so just don't use it. Is that not the answer? I don't know what you mean by the mapping algorithm. On Wed, May 8, 2013 at 4:17 PM, Zhongduo Lin zhong...@gmail.com wrote: Thank you for your reply. So in the case that item 4 is in the test set, will Mahout just not take it into consideration or generate any preference instead? Any is it there any way to evaluate the mapping algorithm in Mahout? Best Regards, Jimmy
Re: Question about evaluating a Recommender System
AFAIK, the recommender would predict a NaN, which will be ignored by the evaluator. However, I am not sure if there is any way to know how many of these were actually produced in the evaluation step, that is, something like the count of predictions with a NaN value. Cheers, Alex Zhongduo Lin escribió: Thank you for the quick response. I agree that a neighborhood size of 2 will make the predictions more sensible. But my concern is that a neighborhood size of 2 can only predict a very small proportion of preference for each users. Let's take a look at the previous example, how can it predict item 4 if item 4 happens to be chosen as in the test set? I think this is quite common in my case as well as for Amazon or eBay, since the rating is very sparse. So I just don't know how it can still be run. User 1rated item 1, 2, 3, 4 neighbour1 of user 1 rated item 1, 2 neighbour2 of user 1 rated item 1, 3 I wouldn't expect that the Root Mean Square error will have different performance than the Absolute difference, since in that case most of the predictions are close to 1, resulting a near zero error no matter I am using absolute difference or RMSE. How can I say RMSE is worse relative to the variance of the data set using Mahout? Unfortunately I got an error using the precision and recall evaluation method, I guess that's because the data are too sparse. Best Regards, Jimmy On 13-05-08 10:05 AM, Sean Owen wrote: It may be true that the results are best with a neighborhood size of 2. Why is that surprising? Very similar people, by nature, rate similar things, which makes the things you held out of a user's test set likely to be found in the recommendations. The mapping you suggest is not that sensible, yes, since almost everything maps to 1. Not surprisingly, most of your predictions are near 1. That's better in an absolute sense, but RMSE is worse relative to the variance of the data set. This is not a good mapping -- or else, RMSE is not a very good metric, yes. So, don't do one of those two things. Try mean average precision for a metric that is not directly related to the prediction values. On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote: Thank you for your reply. I think the evaluation process involves randomly choosing the evaluation proportion. The problem is that I always get the best result when I set neighbors to 2, which seems unreasonable to me. Since there should be many test case that the recommender system couldn't predict at all. So why did I still get a valid result? How does Mahout handle this case? Sorry I didn't make myself clear for the second question. Here is the problem: I have a set of inferred preference ranging from 0 to 1000. But I want to map it to 1 - 5. So there can be many ways for mapping. Let's take a simple example, if the mapping rule is like the following: if (inferred_preference 995) preference = 1; else preference = inferred_preference - 995. You can see that this is a really bad mapping algorithms, but if we run the generated preference to Mahout, it is going to give me a really nice result because most of the preference is 1. So is there any other metric to evaluate this? Any help will be highly appreciated. Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 4:44 AM, Sean Owen wrote: It is true that a process based on user-user similarity only won't be able to recommend item 4 in this example. This is a drawback of the algorithm and not something that can be worked around. You could try not to choose this item in the test set, but then that does not quite reflect reality in the test. If you just mean that compressing the range of pref values improves RMSE in absolute terms, yes it does of course. But not in relative terms. There is nothing inherently better or worse about a small range in this example. RMSE is a fine eval metric, but you can also considered mean average precision. Sean -- Alejandro Bellogin Kouki http://rincon.uam.es/dir?cw=435275268554687
Re: Question about evaluating a Recommender System
This accounts for why a neighborhood size of 2 always gives me the best result. Thank you! Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 2:40 PM, Alejandro Bellogin Kouki wrote: AFAIK, the recommender would predict a NaN, which will be ignored by the evaluator. However, I am not sure if there is any way to know how many of these were actually produced in the evaluation step, that is, something like the count of predictions with a NaN value. Cheers, Alex Zhongduo Lin escribió: Thank you for the quick response. I agree that a neighborhood size of 2 will make the predictions more sensible. But my concern is that a neighborhood size of 2 can only predict a very small proportion of preference for each users. Let's take a look at the previous example, how can it predict item 4 if item 4 happens to be chosen as in the test set? I think this is quite common in my case as well as for Amazon or eBay, since the rating is very sparse. So I just don't know how it can still be run. User 1rated item 1, 2, 3, 4 neighbour1 of user 1 rated item 1, 2 neighbour2 of user 1 rated item 1, 3 I wouldn't expect that the Root Mean Square error will have different performance than the Absolute difference, since in that case most of the predictions are close to 1, resulting a near zero error no matter I am using absolute difference or RMSE. How can I say RMSE is worse relative to the variance of the data set using Mahout? Unfortunately I got an error using the precision and recall evaluation method, I guess that's because the data are too sparse. Best Regards, Jimmy On 13-05-08 10:05 AM, Sean Owen wrote: It may be true that the results are best with a neighborhood size of 2. Why is that surprising? Very similar people, by nature, rate similar things, which makes the things you held out of a user's test set likely to be found in the recommendations. The mapping you suggest is not that sensible, yes, since almost everything maps to 1. Not surprisingly, most of your predictions are near 1. That's better in an absolute sense, but RMSE is worse relative to the variance of the data set. This is not a good mapping -- or else, RMSE is not a very good metric, yes. So, don't do one of those two things. Try mean average precision for a metric that is not directly related to the prediction values. On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote: Thank you for your reply. I think the evaluation process involves randomly choosing the evaluation proportion. The problem is that I always get the best result when I set neighbors to 2, which seems unreasonable to me. Since there should be many test case that the recommender system couldn't predict at all. So why did I still get a valid result? How does Mahout handle this case? Sorry I didn't make myself clear for the second question. Here is the problem: I have a set of inferred preference ranging from 0 to 1000. But I want to map it to 1 - 5. So there can be many ways for mapping. Let's take a simple example, if the mapping rule is like the following: if (inferred_preference 995) preference = 1; else preference = inferred_preference - 995. You can see that this is a really bad mapping algorithms, but if we run the generated preference to Mahout, it is going to give me a really nice result because most of the preference is 1. So is there any other metric to evaluate this? Any help will be highly appreciated. Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 4:44 AM, Sean Owen wrote: It is true that a process based on user-user similarity only won't be able to recommend item 4 in this example. This is a drawback of the algorithm and not something that can be worked around. You could try not to choose this item in the test set, but then that does not quite reflect reality in the test. If you just mean that compressing the range of pref values improves RMSE in absolute terms, yes it does of course. But not in relative terms. There is nothing inherently better or worse about a small range in this example. RMSE is a fine eval metric, but you can also considered mean average precision. Sean
Re: Question about evaluating a Recommender System
Ah, yes that's right. Yes if you have a lot of these values, the test is really not valid. It may look 'better' but isn't for just this reason. You want to make sure the result doesn't have many of these or else you would discard it. Look for log lines like Unable to recommend in X cases On Wed, May 8, 2013 at 8:00 PM, Zhongduo Lin zhong...@gmail.com wrote: This accounts for why a neighborhood size of 2 always gives me the best result. Thank you! Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 2:40 PM, Alejandro Bellogin Kouki wrote: AFAIK, the recommender would predict a NaN, which will be ignored by the evaluator. However, I am not sure if there is any way to know how many of these were actually produced in the evaluation step, that is, something like the count of predictions with a NaN value. Cheers, Alex Zhongduo Lin escribió: Thank you for the quick response. I agree that a neighborhood size of 2 will make the predictions more sensible. But my concern is that a neighborhood size of 2 can only predict a very small proportion of preference for each users. Let's take a look at the previous example, how can it predict item 4 if item 4 happens to be chosen as in the test set? I think this is quite common in my case as well as for Amazon or eBay, since the rating is very sparse. So I just don't know how it can still be run. User 1rated item 1, 2, 3, 4 neighbour1 of user 1 rated item 1, 2 neighbour2 of user 1 rated item 1, 3 I wouldn't expect that the Root Mean Square error will have different performance than the Absolute difference, since in that case most of the predictions are close to 1, resulting a near zero error no matter I am using absolute difference or RMSE. How can I say RMSE is worse relative to the variance of the data set using Mahout? Unfortunately I got an error using the precision and recall evaluation method, I guess that's because the data are too sparse. Best Regards, Jimmy On 13-05-08 10:05 AM, Sean Owen wrote: It may be true that the results are best with a neighborhood size of 2. Why is that surprising? Very similar people, by nature, rate similar things, which makes the things you held out of a user's test set likely to be found in the recommendations. The mapping you suggest is not that sensible, yes, since almost everything maps to 1. Not surprisingly, most of your predictions are near 1. That's better in an absolute sense, but RMSE is worse relative to the variance of the data set. This is not a good mapping -- or else, RMSE is not a very good metric, yes. So, don't do one of those two things. Try mean average precision for a metric that is not directly related to the prediction values. On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote: Thank you for your reply. I think the evaluation process involves randomly choosing the evaluation proportion. The problem is that I always get the best result when I set neighbors to 2, which seems unreasonable to me. Since there should be many test case that the recommender system couldn't predict at all. So why did I still get a valid result? How does Mahout handle this case? Sorry I didn't make myself clear for the second question. Here is the problem: I have a set of inferred preference ranging from 0 to 1000. But I want to map it to 1 - 5. So there can be many ways for mapping. Let's take a simple example, if the mapping rule is like the following: if (inferred_preference 995) preference = 1; else preference = inferred_preference - 995. You can see that this is a really bad mapping algorithms, but if we run the generated preference to Mahout, it is going to give me a really nice result because most of the preference is 1. So is there any other metric to evaluate this? Any help will be highly appreciated. Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 4:44 AM, Sean Owen wrote: It is true that a process based on user-user similarity only won't be able to recommend item 4 in this example. This is a drawback of the algorithm and not something that can be worked around. You could try not to choose this item in the test set, but then that does not quite reflect reality in the test. If you just mean that compressing the range of pref values improves RMSE in absolute terms, yes it does of course. But not in relative terms. There is nothing inherently better or worse about a small range in this example. RMSE is a fine eval metric, but you can also considered mean average precision. Sean
Re: Question about evaluating a Recommender System
I see. Thank you for your information! Any idea about evaluating the method of mapping inferred preference to a smaller scale with Mahout? Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 3:32 PM, Sean Owen wrote: Ah, yes that's right. Yes if you have a lot of these values, the test is really not valid. It may look 'better' but isn't for just this reason. You want to make sure the result doesn't have many of these or else you would discard it. Look for log lines like Unable to recommend in X cases On Wed, May 8, 2013 at 8:00 PM, Zhongduo Lin zhong...@gmail.com wrote: This accounts for why a neighborhood size of 2 always gives me the best result. Thank you! Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 2:40 PM, Alejandro Bellogin Kouki wrote: AFAIK, the recommender would predict a NaN, which will be ignored by the evaluator. However, I am not sure if there is any way to know how many of these were actually produced in the evaluation step, that is, something like the count of predictions with a NaN value. Cheers, Alex Zhongduo Lin escribió: Thank you for the quick response. I agree that a neighborhood size of 2 will make the predictions more sensible. But my concern is that a neighborhood size of 2 can only predict a very small proportion of preference for each users. Let's take a look at the previous example, how can it predict item 4 if item 4 happens to be chosen as in the test set? I think this is quite common in my case as well as for Amazon or eBay, since the rating is very sparse. So I just don't know how it can still be run. User 1rated item 1, 2, 3, 4 neighbour1 of user 1 rated item 1, 2 neighbour2 of user 1 rated item 1, 3 I wouldn't expect that the Root Mean Square error will have different performance than the Absolute difference, since in that case most of the predictions are close to 1, resulting a near zero error no matter I am using absolute difference or RMSE. How can I say RMSE is worse relative to the variance of the data set using Mahout? Unfortunately I got an error using the precision and recall evaluation method, I guess that's because the data are too sparse. Best Regards, Jimmy On 13-05-08 10:05 AM, Sean Owen wrote: It may be true that the results are best with a neighborhood size of 2. Why is that surprising? Very similar people, by nature, rate similar things, which makes the things you held out of a user's test set likely to be found in the recommendations. The mapping you suggest is not that sensible, yes, since almost everything maps to 1. Not surprisingly, most of your predictions are near 1. That's better in an absolute sense, but RMSE is worse relative to the variance of the data set. This is not a good mapping -- or else, RMSE is not a very good metric, yes. So, don't do one of those two things. Try mean average precision for a metric that is not directly related to the prediction values. On Wed, May 8, 2013 at 2:45 PM, Zhongduo Lin zhong...@gmail.com wrote: Thank you for your reply. I think the evaluation process involves randomly choosing the evaluation proportion. The problem is that I always get the best result when I set neighbors to 2, which seems unreasonable to me. Since there should be many test case that the recommender system couldn't predict at all. So why did I still get a valid result? How does Mahout handle this case? Sorry I didn't make myself clear for the second question. Here is the problem: I have a set of inferred preference ranging from 0 to 1000. But I want to map it to 1 - 5. So there can be many ways for mapping. Let's take a simple example, if the mapping rule is like the following: if (inferred_preference 995) preference = 1; else preference = inferred_preference - 995. You can see that this is a really bad mapping algorithms, but if we run the generated preference to Mahout, it is going to give me a really nice result because most of the preference is 1. So is there any other metric to evaluate this? Any help will be highly appreciated. Best Regards, Jimmy Zhongduo Lin (Jimmy) MASc candidate in ECE department University of Toronto On 2013-05-08 4:44 AM, Sean Owen wrote: It is true that a process based on user-user similarity only won't be able to recommend item 4 in this example. This is a drawback of the algorithm and not something that can be worked around. You could try not to choose this item in the test set, but then that does not quite reflect reality in the test. If you just mean that compressing the range of pref values improves RMSE in absolute terms, yes it does of course. But not in relative terms. There is nothing inherently better or worse about a small range in this example. RMSE is a fine eval metric, but you can also considered mean average precision. Sean
Question about evaluating a Recommender System
Hi All, I am using the Mahout to build a user-based recommender system (RS). The evaluation method I am using is AverageAbsoluteDifferenceRecommenderEvaluator, which according to the Mahout in Action randomly sets aside some existing preference and calculate the difference between the predicted value and the real one. The first question I have is that in a user-based RS, if we choose a small number of neighbours, then it is quite possible that the prediction is not available at all. Here is an example: User 1 rated item 1, 2, 3, 4 neighbour1 of user 1 rated item 1, 2 neighbour2 of user 1 rated item 1, 3 In the case above, the number of neighbours is two, so if we take out the rating of user 1 to item 4, there is no way to predict it. What will mahout deal with such a problem? Also, I am trying to map inferred preferences to a scale of 1-5. But the problem is that if I simply map all the preference to 1-2, then I will get a really nice evaluation result (almost 0), but you can easily see that this is not a right way to do it. So I guess the question is whether there is another way to evaluate the preference mapping algorithm. Any help will be highly appreciated. Best Regards, Jimmy