Allow collaborative filtering evaluators to use custom logic in splitting data 
set
----------------------------------------------------------------------------------

                 Key: MAHOUT-906
                 URL: https://issues.apache.org/jira/browse/MAHOUT-906
             Project: Mahout
          Issue Type: Improvement
          Components: Collaborative Filtering
    Affects Versions: 0.5
            Reporter: Anatoliy Kats
            Assignee: Sean Owen
            Priority: Minor
             Fix For: 0.6


I want to start a discussion about factoring out the logic used in splitting 
the data set into training and testing.  Here is how things stand:  There are 
two independent evaluator based classes:  
AbstractDifferenceRecommenderEvaluator, splits all the preferences randomly 
into a training and testing set.  GenericRecommenderIRStatsEvaluator takes one 
user at a time, removes their top AT preferences, and counts how many of them 
the system recommends back.

I have two use cases that both deal with temporal dynamics.  In one case, there 
may be expired items that can be used for building a training model, but not a 
test model.  In the other, I may want to simulate the behavior of a real system 
by building a preference matrix on days 1-k, and testing on the ratings the 
user generated on the day k+1.  In this case, it's not items, but 
preferences(user, item, rating triplets) which may belong only to the training 
set.  Before we discuss appropriate design, are there any other use cases we 
need to keep in mind?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to