[
https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860223#action_12860223
]
Sean Owen commented on MAHOUT-305:
----------------------------------
And now more thoughts:
Yes all the code is checked in.
This is still running perhaps slower than I'd like. The step to distributing
the computation more slowed things down considerably in the I/O phases -- but
avoided use of MapFile which was in the end just being used very wrongly. So a
net win.
The slowest step by far is outputting the partial vector products. Each is as
big as a column of the co-occurrence matrix (which is sparse, yes), and one is
output for each preference value. That's huge. This would be an ideal place for
a combiner but it's a reducer, so it's not available (?)
Co-occurrence is also slowish. It does use a combiner but to get a good hit
rate, it needs to have a very large buffer.
Everything works quite well if you're willing to prune data. For example, very
roughly, on a 10M rating data set -- *but keeping only 20 prefs per user for
each of 70,000 users* -- the total time per users is in seconds of machine
time. Not too bad.
But take that off and this still balloons quite a bit. Naturally, pruning is a
good thing but it seems like we should be able to speed up more.
> Combine both cooccurrence-based CF M/R jobs
> -------------------------------------------
>
> Key: MAHOUT-305
> URL: https://issues.apache.org/jira/browse/MAHOUT-305
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Affects Versions: 0.2
> Reporter: Sean Owen
> Assignee: Ankur
> Priority: Minor
>
> We have two different but essentially identical MapReduce jobs to make
> recommendations based on item co-occurrence:
> org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be
> merged. Not sure exactly how to approach that but noting this in JIRA, per
> Ankur.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.