[ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029464#comment-14029464
 ] 

ASF GitHub Bot commented on MAHOUT-1464:
----------------------------------------

Github user pferrel commented on a diff in the pull request:

    https://github.com/apache/mahout/pull/12#discussion_r13714291
  
    --- Diff: 
spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala ---
    @@ -0,0 +1,214 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.mahout.cf
    +
    +import org.apache.mahout.math._
    +import scalabindings._
    +import RLikeOps._
    +import drm._
    +import RLikeDrmOps._
    +import org.apache.mahout.sparkbindings._
    +import scala.collection.JavaConversions._
    +import org.apache.mahout.math.stats.LogLikelihood
    +import collection._
    +import org.apache.mahout.common.RandomUtils
    +import org.apache.mahout.math.function.{VectorFunction, Functions}
    +
    +
    +/**
    + * based on "Ted Dunnning & Ellen Friedman: Practical Machine Learning, 
Innovations in Recommendation",
    + * available at http://www.mapr.com/practical-machine-learning
    + *
    + * see also "Sebastian Schelter, Christoph Boden, Volker Markl:
    + * Scalable Similarity-Based Neighborhood Methods with MapReduce
    + * ACM Conference on Recommender Systems 2012"
    + */
    +object CooccurrenceAnalysis extends Serializable {
    +
    +  /** Compares (Int,Double) pairs by the second value */
    +  private val orderByScore = Ordering.fromLessThan[(Int, Double)] { case 
((_, score1), (_, score2)) => score1 > score2}
    +
    +  def cooccurrences(drmARaw: DrmLike[Int], randomSeed: Int = 0xdeadbeef, 
maxInterestingItemsPerThing: Int = 50,
    +                    maxNumInteractions: Int = 500, drmBs: 
Array[DrmLike[Int]] = Array()): List[DrmLike[Int]] = {
    +
    +    implicit val distributedContext = drmARaw.context
    +
    +    // Apply selective downsampling, pin resulting matrix
    +    val drmA = sampleDownAndBinarize(drmARaw, randomSeed, 
maxNumInteractions)
    +
    +    // num users, which equals the maximum number of interactions per item
    +    val numUsers = drmA.nrow.toInt
    +
    +    // Compute & broadcast the number of interactions per thing in A
    +    val bcastInteractionsPerItemA = drmBroadcast(drmA.colCounts)
    --- End diff --
    
    colCounts or whatever we call it is just as efficient, is distributed and 
tells the reader what is the important value. 


> Cooccurrence Analysis on Spark
> ------------------------------
>
>                 Key: MAHOUT-1464
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>         Environment: hadoop, spark
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh
>
>
> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
> runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
> can be used as input. 
> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
> several applications including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to