[
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029445#comment-14029445
]
ASF GitHub Bot commented on MAHOUT-1464:
----------------------------------------
Github user sscdotopen commented on a diff in the pull request:
https://github.com/apache/mahout/pull/12#discussion_r13713951
--- Diff:
spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala ---
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.mahout.cf
+
+import org.apache.mahout.math._
+import scalabindings._
+import RLikeOps._
+import drm._
+import RLikeDrmOps._
+import org.apache.mahout.sparkbindings._
+import scala.collection.JavaConversions._
+import org.apache.mahout.math.stats.LogLikelihood
+import collection._
+import org.apache.mahout.common.RandomUtils
+import org.apache.mahout.math.function.{VectorFunction, Functions}
+
+
+/**
+ * based on "Ted Dunnning & Ellen Friedman: Practical Machine Learning,
Innovations in Recommendation",
+ * available at http://www.mapr.com/practical-machine-learning
+ *
+ * see also "Sebastian Schelter, Christoph Boden, Volker Markl:
+ * Scalable Similarity-Based Neighborhood Methods with MapReduce
+ * ACM Conference on Recommender Systems 2012"
+ */
+object CooccurrenceAnalysis extends Serializable {
+
+ /** Compares (Int,Double) pairs by the second value */
+ private val orderByScore = Ordering.fromLessThan[(Int, Double)] { case
((_, score1), (_, score2)) => score1 > score2}
+
+ def cooccurrences(drmARaw: DrmLike[Int], randomSeed: Int = 0xdeadbeef,
maxInterestingItemsPerThing: Int = 50,
+ maxNumInteractions: Int = 500, drmBs:
Array[DrmLike[Int]] = Array()): List[DrmLike[Int]] = {
+
+ implicit val distributedContext = drmARaw.context
+
+ // Apply selective downsampling, pin resulting matrix
+ val drmA = sampleDownAndBinarize(drmARaw, randomSeed,
maxNumInteractions)
+
+ // num users, which equals the maximum number of interactions per item
+ val numUsers = drmA.nrow.toInt
+
+ // Compute & broadcast the number of interactions per thing in A
+ val bcastInteractionsPerItemA = drmBroadcast(drmA.colCounts)
--- End diff --
drmA is already binary here, so we could use colSums
> Cooccurrence Analysis on Spark
> ------------------------------
>
> Key: MAHOUT-1464
> URL: https://issues.apache.org/jira/browse/MAHOUT-1464
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Environment: hadoop, spark
> Reporter: Pat Ferrel
> Assignee: Pat Ferrel
> Fix For: 1.0
>
> Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch,
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh
>
>
> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that
> runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM
> can be used as input.
> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has
> several applications including cross-action recommendations.
--
This message was sent by Atlassian JIRA
(v6.2#6252)