[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170003#comment-13170003 ]
Anatoliy Kats commented on MAHOUT-906: -------------------------------------- You've actually convinced me to change the estimate test. Changing relevantItemIDs will not do what I am trying to do. Suppose John is the current user, and I restrict his relevant items to be the preferences he expressed before Dec 5th. I can remove them, sure, but then I'll be building a model based on all other users' preferences: I'd be using Mary's preferences from Dec 10th to predict John's on Dec 5th. That misses the entire point, that user preferences are dynamic, and change over time. The production algorithm will not be able to predict backwards, so our test environment should not allow that either. That's why I want to have all the data before a fixed time to be my training set, and the data for the following time period to be my test said. Which recommender do you think I should modify to make that happen? > Allow collaborative filtering evaluators to use custom logic in splitting > data set > ---------------------------------------------------------------------------------- > > Key: MAHOUT-906 > URL: https://issues.apache.org/jira/browse/MAHOUT-906 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Affects Versions: 0.5 > Reporter: Anatoliy Kats > Priority: Minor > Labels: features > Original Estimate: 48h > Remaining Estimate: 48h > > I want to start a discussion about factoring out the logic used in splitting > the data set into training and testing. Here is how things stand: There are > two independent evaluator based classes: > AbstractDifferenceRecommenderEvaluator, splits all the preferences randomly > into a training and testing set. GenericRecommenderIRStatsEvaluator takes > one user at a time, removes their top AT preferences, and counts how many of > them the system recommends back. > I have two use cases that both deal with temporal dynamics. In one case, > there may be expired items that can be used for building a training model, > but not a test model. In the other, I may want to simulate the behavior of a > real system by building a preference matrix on days 1-k, and testing on the > ratings the user generated on the day k+1. In this case, it's not items, but > preferences(user, item, rating triplets) which may belong only to the > training set. Before we discuss appropriate design, are there any other use > cases we need to keep in mind? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira