Author: buildbot
Date: Mon Jan 19 22:07:31 2015
New Revision: 936845
Log:
Staging update by buildbot for mahout
Modified:
websites/staging/mahout/trunk/content/ (props changed)
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Jan 19 22:07:31 2015
@@ -1 +1 @@
-1647076
+1653133
Modified:
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
==============================================================================
---
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
(original)
+++
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
Mon Jan 19 22:07:31 2015
@@ -250,6 +250,14 @@
be used to create "other people also liked these things" type recommendations
and paired with a search engine can
personalize recommendations for individual users. <em>spark-rowsimilarity</em>
can provide non-personalized content based
recommendations and when paired with a search engine can be used to
personalize content based recommendations.</p>
+<h2 id="references">References</h2>
+<ol>
+<li>A free ebook, which talks about the general idea: <a
href="https://www.mapr.com/practical-machine-learning">Practical Machine
Learning</a></li>
+<li>A slide deck, which talks about mixing actions or other indicators: <a
href="http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/">Creating
a Unified Recommender</a></li>
+<li>Two blog posts: <a
href="http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/">What's
New in Recommenders: part #1</a>
+and <a
href="http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/">What's
New in Recommenders: part #2</a></li>
+<li>A post describing the loglikelihood ratio: <a
href="http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html">Surprise
and Coinsidense</a> LLR is used to reduce noise in the data while keeping the
calculations O(n) complexity.</li>
+</ol>
<p>Below are the command line jobs but the drivers and associated code can
also be customized and accessed from the Scala APIs.</p>
<h2 id="1-spark-itemsimilarity">1. spark-itemsimilarity</h2>
<p><em>spark-itemsimilarity</em> is the Spark counterpart of the of the Mahout
mapreduce job called <em>itemsimilarity</em>. It takes in elements of
interactions, which have userID, itemID, and optionally a value. It will
produce one of more indicator matrices created by comparing every user's
interactions with every other user. The indicator matrix is an item x item
matrix where the values are log-likelihood ratio strengths. For the legacy
mapreduce version, there were several possible similarity measures but these
are being deprecated in favor of LLR because in practice it performs the
best.</p>