[ https://issues.apache.org/jira/browse/MAHOUT-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anatoliy Kats updated MAHOUT-906: --------------------------------- Status: Patch Available (was: Open) > Allow collaborative filtering evaluators to use custom logic in splitting > data set > ---------------------------------------------------------------------------------- > > Key: MAHOUT-906 > URL: https://issues.apache.org/jira/browse/MAHOUT-906 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Affects Versions: 0.5 > Reporter: Anatoliy Kats > Priority: Minor > Labels: features > Attachments: MAHOUT-906.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > I want to start a discussion about factoring out the logic used in splitting > the data set into training and testing. Here is how things stand: There are > two independent evaluator based classes: > AbstractDifferenceRecommenderEvaluator, splits all the preferences randomly > into a training and testing set. GenericRecommenderIRStatsEvaluator takes > one user at a time, removes their top AT preferences, and counts how many of > them the system recommends back. > I have two use cases that both deal with temporal dynamics. In one case, > there may be expired items that can be used for building a training model, > but not a test model. In the other, I may want to simulate the behavior of a > real system by building a preference matrix on days 1-k, and testing on the > ratings the user generated on the day k+1. In this case, it's not items, but > preferences(user, item, rating triplets) which may belong only to the > training set. Before we discuss appropriate design, are there any other use > cases we need to keep in mind? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira