Re: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

Ted Dunning Mon, 09 Jun 2014 16:12:55 -0700

Sounds like a very plausible root cause.





On Mon, Jun 9, 2014 at 4:03 PM, Pat Ferrel (JIRA) <j...@apache.org> wrote:

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025893#comment-14025893
> ]
>
> Pat Ferrel commented on MAHOUT-1464:
> ------------------------------------
>
> seems like the downsampleAndBinarize method is returning the wrong values.
> It is actually summing the values where it should be counting the non-zero
> elements?????
>
>         // Downsample the interaction vector of each user
>         for (userIndex <- 0 until keys.size) {
>
>           val interactionsOfUser = block(userIndex, ::) // this is a Vector
>           // if the values are non-boolean the sum will not be the number
> of interactions it will be a sum of strength-of-interaction, right?
>           // val numInteractionsOfUser = interactionsOfUser.sum // doesn't
> this sum strength of interactions?
>           val numInteractionsOfUser =
> interactionsOfUser.getNumNonZeroElements()  // should do this I think
>
>           val perUserSampleRate = math.min(maxNumInteractions,
> numInteractionsOfUser) / numInteractionsOfUser
>
>           interactionsOfUser.nonZeroes().foreach { elem =>
>             val numInteractionsWithThing = numInteractions(elem.index)
>             val perThingSampleRate = math.min(maxNumInteractions,
> numInteractionsWithThing) / numInteractionsWithThing
>
>             if (random.nextDouble() <= math.min(perUserSampleRate,
> perThingSampleRate)) {
>               // We ignore the original interaction value and create a
> binary 0-1 matrix
>               // as we only consider whether interactions happened or did
> not happen
>               downsampledBlock(userIndex, elem.index) = 1
>             }
>           }
>
>
> > Cooccurrence Analysis on Spark
> > ------------------------------
> >
> >                 Key: MAHOUT-1464
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
> >             Project: Mahout
> >          Issue Type: Improvement
> >          Components: Collaborative Filtering
> >         Environment: hadoop, spark
> >            Reporter: Pat Ferrel
> >            Assignee: Pat Ferrel
> >             Fix For: 1.0
> >
> >         Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch,
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch,
> run-spark-xrsj.sh
> >
> >
> > Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR)
> that runs on Spark. This should be compatible with Mahout Spark DRM DSL so
> a DRM can be used as input.
> > Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence
> has several applications including cross-action recommendations.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>

Re: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

Reply via email to