Hi

I would like to unsubscribe to MAHOUT.

Thanking you
Tanya

On Wed, Apr 28, 2010 at 1:38 AM, Ted Dunning (JIRA) <j...@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/MAHOUT-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861520#action_12861520]
>
> Ted Dunning commented on MAHOUT-305:
> ------------------------------------
>
> My own approach in the past was to group on user to get a count as well as
> a list of items for that user.  This can be done in one MR step with a bit
> of fancy footwork or in two if you want simple.  The fancy footwork involves
> reading the item list into memory as we sample to avoid keeping too many.
>  It is relatively easy to do the sampling in a completely fair way, allowing
> all samples equal chance of survival by using a swapping algorithm.  A
> completely sampling is also trivial.  With some thought, it is probably
> possible to do various recency weighted samples as well.
>
> With the count for the items for each user, or a clever on-line sampling
> algorithm I can down-sample the user list before running the actual
> cooccurrence counting step.  This is a good point to drop users with < k_min
> items.  k_min should be at least 2 since users with one item cannot give
> rise to non-trivial cooccurrence.  A value of 3-5 isn't bad either.
>
> The total time involved is pretty dominated by the original data reading so
> the extra MR step doesn't hurt all that much.  The win obtained by avoiding
> quadratic explosion of the cooccurrence step is massive.
>
>
>
> > Combine both cooccurrence-based CF M/R jobs
> > -------------------------------------------
> >
> >                 Key: MAHOUT-305
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-305
> >             Project: Mahout
> >          Issue Type: Improvement
> >          Components: Collaborative Filtering
> >    Affects Versions: 0.2
> >            Reporter: Sean Owen
> >            Assignee: Ankur
> >            Priority: Minor
> >
> > We have two different but essentially identical MapReduce jobs to make
> recommendations based on item co-occurrence:
> org.apache.mahout.cf.taste.hadoop.{item,cooccurrence}. They ought to be
> merged. Not sure exactly how to approach that but noting this in JIRA, per
> Ankur.
>
> --
>  This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Reply via email to