[ https://issues.apache.org/jira/browse/MAHOUT-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated MAHOUT-910: ----------------------------- Attachment: SamplingCandidateItemsStrategy.java It changed so much it might be easier to read the new source file. Here I've gone another way: three parameters to control, but they're all just the "factor" from Sebstian's previous code. It lets you specify limits of the form f*log(n), so limits are logarithmic in the number of items/users. How's this? There's no overall cap since it is determined by these three values. > Improve sampling in SamplingCandidateItemStrategy, optimize intersection > computations > ------------------------------------------------------------------------------------- > > Key: MAHOUT-910 > URL: https://issues.apache.org/jira/browse/MAHOUT-910 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Affects Versions: 0.5 > Reporter: Sean Owen > Assignee: Sean Owen > Fix For: 0.6 > > Attachments: MAHOUT-910.patch, MAHOUT-910.patch, MAHOUT-910.patch, > SamplingCandidateItemsStrategy.java > > > Per the lengthy discussion on the mailing list about optimizing > SamplingCandidateItemStrategy and related code, I'm opening this placeholder > issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira