Hi I would like to unsubscribe to MAHOUT.
Thanking you Tanya On Wed, Apr 28, 2010 at 1:38 AM, Ted Dunning (JIRA) <j...@apache.org> wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861520#action_12861520] > > Ted Dunning commented on MAHOUT-305: > ------------------------------------ > > My own approach in the past was to group on user to get a count as well as > a list of items for that user. This can be done in one MR step with a bit > of fancy footwork or in two if you want simple. The fancy footwork involves > reading the item list into memory as we sample to avoid keeping too many. > It is relatively easy to do the sampling in a completely fair way, allowing > all samples equal chance of survival by using a swapping algorithm. A > completely sampling is also trivial. With some thought, it is probably > possible to do various recency weighted samples as well. > > With the count for the items for each user, or a clever on-line sampling > algorithm I can down-sample the user list before running the actual > cooccurrence counting step. This is a good point to drop users with < k_min > items. k_min should be at least 2 since users with one item cannot give > rise to non-trivial cooccurrence. A value of 3-5 isn't bad either. > > The total time involved is pretty dominated by the original data reading so > the extra MR step doesn't hurt all that much. The win obtained by avoiding > quadratic explosion of the cooccurrence step is massive. > > > > > Combine both cooccurrence-based CF M/R jobs > > ------------------------------------------- > > > > Key: MAHOUT-305 > > URL: https://issues.apache.org/jira/browse/MAHOUT-305 > > Project: Mahout > > Issue Type: Improvement > > Components: Collaborative Filtering > > Affects Versions: 0.2 > > Reporter: Sean Owen > > Assignee: Ankur > > Priority: Minor > > > > We have two different but essentially identical MapReduce jobs to make > recommendations based on item co-occurrence: > org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be > merged. Not sure exactly how to approach that but noting this in JIRA, per > Ankur. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >