[ 
https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169332#comment-13169332
 ] 

Sean Owen commented on MAHOUT-906:
----------------------------------

For the IR precision/recall evaluation, if you *do* have preference values, 
then picking the "relevant" items by recency doesn't work. It's not true that 
these are the best answers that the recommendation can give. (It's not even 
quite true that the highest-rated items are the best recommendations!) Imagine 
you just rated several movies you hate. You would not want to score a 
recommender more highly because it recommends these.

If you *don't* have preference values, then picking the relevant items is 
arbitrary. Picking by time is as good as randomly picking. But, that means 
you're not expecting to gain anything by splitting by time. Any selection is 
about equivalent.

So I think this doesn't help your use of the IR test, and creates a bad test 
for other use cases. I *do* think it could be meaningful for the estimation 
test.

Even if I thought it were a good test, I don't see the need for a new class. 
This is merely extracting how "relevantIDs" is computed. Yes, you can refactor 
that as you say, but then how does anything else change?
                
> Allow collaborative filtering evaluators to use custom logic in splitting 
> data set
> ----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-906
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-906
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.5
>            Reporter: Anatoliy Kats
>            Priority: Minor
>              Labels: features
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I want to start a discussion about factoring out the logic used in splitting 
> the data set into training and testing.  Here is how things stand:  There are 
> two independent evaluator based classes:  
> AbstractDifferenceRecommenderEvaluator, splits all the preferences randomly 
> into a training and testing set.  GenericRecommenderIRStatsEvaluator takes 
> one user at a time, removes their top AT preferences, and counts how many of 
> them the system recommends back.
> I have two use cases that both deal with temporal dynamics.  In one case, 
> there may be expired items that can be used for building a training model, 
> but not a test model.  In the other, I may want to simulate the behavior of a 
> real system by building a preference matrix on days 1-k, and testing on the 
> ratings the user generated on the day k+1.  In this case, it's not items, but 
> preferences(user, item, rating triplets) which may belong only to the 
> training set.  Before we discuss appropriate design, are there any other use 
> cases we need to keep in mind?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to