http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-collaborative-filtering.html ---------------------------------------------------------------------- diff --git a/site/docs/2.1.0/mllib-collaborative-filtering.html b/site/docs/2.1.0/mllib-collaborative-filtering.html index e453032..b3f9e08 100644 --- a/site/docs/2.1.0/mllib-collaborative-filtering.html +++ b/site/docs/2.1.0/mllib-collaborative-filtering.html @@ -322,13 +322,13 @@ <ul id="markdown-toc"> - <li><a href="#collaborative-filtering" id="markdown-toc-collaborative-filtering">Collaborative filtering</a> <ul> - <li><a href="#explicit-vs-implicit-feedback" id="markdown-toc-explicit-vs-implicit-feedback">Explicit vs. implicit feedback</a></li> - <li><a href="#scaling-of-the-regularization-parameter" id="markdown-toc-scaling-of-the-regularization-parameter">Scaling of the regularization parameter</a></li> + <li><a href="#collaborative-filtering">Collaborative filtering</a> <ul> + <li><a href="#explicit-vs-implicit-feedback">Explicit vs. implicit feedback</a></li> + <li><a href="#scaling-of-the-regularization-parameter">Scaling of the regularization parameter</a></li> </ul> </li> - <li><a href="#examples" id="markdown-toc-examples">Examples</a></li> - <li><a href="#tutorial" id="markdown-toc-tutorial">Tutorial</a></li> + <li><a href="#examples">Examples</a></li> + <li><a href="#tutorial">Tutorial</a></li> </ul> <h2 id="collaborative-filtering">Collaborative filtering</h2> @@ -393,7 +393,7 @@ recommendation model by measuring the Mean Squared Error of rating prediction.</ <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.recommendation.ALS"><code>ALS</code> Scala docs</a> for more details on the API.</p> - <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.recommendation.ALS</span> + <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.recommendation.ALS</span> <span class="k">import</span> <span class="nn">org.apache.spark.mllib.recommendation.MatrixFactorizationModel</span> <span class="k">import</span> <span class="nn">org.apache.spark.mllib.recommendation.Rating</span> @@ -434,9 +434,9 @@ recommendation model by measuring the Mean Squared Error of rating prediction.</ <p>If the rating matrix is derived from another source of information (i.e. it is inferred from other signals), you can use the <code>trainImplicit</code> method to get better results.</p> - <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">alpha</span> <span class="k">=</span> <span class="mf">0.01</span> + <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">alpha</span> <span class="k">=</span> <span class="mf">0.01</span> <span class="k">val</span> <span class="n">lambda</span> <span class="k">=</span> <span class="mf">0.01</span> -<span class="k">val</span> <span class="n">model</span> <span class="k">=</span> <span class="nc">ALS</span><span class="o">.</span><span class="n">trainImplicit</span><span class="o">(</span><span class="n">ratings</span><span class="o">,</span> <span class="n">rank</span><span class="o">,</span> <span class="n">numIterations</span><span class="o">,</span> <span class="n">lambda</span><span class="o">,</span> <span class="n">alpha</span><span class="o">)</span></code></pre></div> +<span class="k">val</span> <span class="n">model</span> <span class="k">=</span> <span class="nc">ALS</span><span class="o">.</span><span class="n">trainImplicit</span><span class="o">(</span><span class="n">ratings</span><span class="o">,</span> <span class="n">rank</span><span class="o">,</span> <span class="n">numIterations</span><span class="o">,</span> <span class="n">lambda</span><span class="o">,</span> <span class="n">alpha</span><span class="o">)</span></code></pre></figure> </div> @@ -449,7 +449,7 @@ that is equivalent to the provided example in Scala is given below:</p> <p>Refer to the <a href="api/java/org/apache/spark/mllib/recommendation/ALS.html"><code>ALS</code> Java docs</a> for more details on the API.</p> - <div class="highlight"><pre><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span> + <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span> <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.*</span><span class="o">;</span> <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span> @@ -458,8 +458,8 @@ that is equivalent to the provided example in Scala is given below:</p> <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.recommendation.Rating</span><span class="o">;</span> <span class="kn">import</span> <span class="nn">org.apache.spark.SparkConf</span><span class="o">;</span> -<span class="n">SparkConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">"Java Collaborative Filtering Example"</span><span class="o">);</span> -<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaSparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">);</span> +<span class="n">SparkConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">"Java Collaborative Filtering Example"</span><span class="o">);</span> +<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaSparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">);</span> <span class="c1">// Load and parse the data</span> <span class="n">String</span> <span class="n">path</span> <span class="o">=</span> <span class="s">"data/mllib/als/test.data"</span><span class="o">;</span> @@ -468,7 +468,7 @@ that is equivalent to the provided example in Scala is given below:</p> <span class="k">new</span> <span class="n">Function</span><span class="o"><</span><span class="n">String</span><span class="o">,</span> <span class="n">Rating</span><span class="o">>()</span> <span class="o">{</span> <span class="kd">public</span> <span class="n">Rating</span> <span class="nf">call</span><span class="o">(</span><span class="n">String</span> <span class="n">s</span><span class="o">)</span> <span class="o">{</span> <span class="n">String</span><span class="o">[]</span> <span class="n">sarray</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">","</span><span class="o">);</span> - <span class="k">return</span> <span class="k">new</span> <span class="nf">Rating</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">sarray</span><span class="o">[</span><span class="mi">0</span><span class="o">]),</span> <span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">sarray</span><span class="o">[</span><span class="mi">1</span><span class="o">]),</span> + <span class="k">return</span> <span class="k">new</span> <span class="n">Rating</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">sarray</span><span class="o">[</span><span class="mi">0</span><span class="o">]),</span> <span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">sarray</span><span class="o">[</span><span class="mi">1</span><span class="o">]),</span> <span class="n">Double</span><span class="o">.</span><span class="na">parseDouble</span><span class="o">(</span><span class="n">sarray</span><span class="o">[</span><span class="mi">2</span><span class="o">]));</span> <span class="o">}</span> <span class="o">}</span> @@ -528,36 +528,36 @@ recommendation by measuring the Mean Squared Error of rating prediction.</p> <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.recommendation.ALS"><code>ALS</code> Python docs</a> for more details on the API.</p> - <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.recommendation</span> <span class="kn">import</span> <span class="n">ALS</span><span class="p">,</span> <span class="n">MatrixFactorizationModel</span><span class="p">,</span> <span class="n">Rating</span> + <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.recommendation</span> <span class="kn">import</span> <span class="n">ALS</span><span class="p">,</span> <span class="n">MatrixFactorizationModel</span><span class="p">,</span> <span class="n">Rating</span> -<span class="c"># Load and parse the data</span> -<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">"data/mllib/als/test.data"</span><span class="p">)</span> -<span class="n">ratings</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span class="p">:</span> <span class="n">l</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">','</span><span class="p">))</span>\ +<span class="c1"># Load and parse the data</span> +<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">"data/mllib/als/test.data"</span><span class="p">)</span> +<span class="n">ratings</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span class="p">:</span> <span class="n">l</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">','</span><span class="p">))</span>\ <span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span class="p">:</span> <span class="n">Rating</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">l</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="nb">int</span><span class="p">(</span><span class="n">l</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="nb">float</span><span class="p">(</span><span class="n">l</span><span class="p">[</span><span class="mi">2</span><span class="p">])))</span> -<span class="c"># Build the recommendation model using Alternating Least Squares</span> +<span class="c1"># Build the recommendation model using Alternating Least Squares</span> <span class="n">rank</span> <span class="o">=</span> <span class="mi">10</span> <span class="n">numIterations</span> <span class="o">=</span> <span class="mi">10</span> <span class="n">model</span> <span class="o">=</span> <span class="n">ALS</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">ratings</span><span class="p">,</span> <span class="n">rank</span><span class="p">,</span> <span class="n">numIterations</span><span class="p">)</span> -<span class="c"># Evaluate the model on training data</span> +<span class="c1"># Evaluate the model on training data</span> <span class="n">testdata</span> <span class="o">=</span> <span class="n">ratings</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="p">(</span><span class="n">p</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">p</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span> <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predictAll</span><span class="p">(</span><span class="n">testdata</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">r</span><span class="p">:</span> <span class="p">((</span><span class="n">r</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">r</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">r</span><span class="p">[</span><span class="mi">2</span><span class="p">]))</span> <span class="n">ratesAndPreds</span> <span class="o">=</span> <span class="n">ratings</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">r</span><span class="p">:</span> <span class="p">((</span><span class="n">r</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">r</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">r</span><span class="p">[</span><span class="mi">2</span><span class="p">]))</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span> <span class="n">MSE</span> <span class="o">=</span> <span class="n">ratesAndPreds</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">r</span><span class="p">:</span> <span class="p">(</span><span class="n">r</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">r</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="mi">1</span><span class="p">])</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span> -<span class="k">print</span><span class="p">(</span><span class="s">"Mean Squared Error = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">MSE</span><span class="p">))</span> +<span class="k">print</span><span class="p">(</span><span class="s2">"Mean Squared Error = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">MSE</span><span class="p">))</span> -<span class="c"># Save and load model</span> -<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">"target/tmp/myCollaborativeFilter"</span><span class="p">)</span> -<span class="n">sameModel</span> <span class="o">=</span> <span class="n">MatrixFactorizationModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">"target/tmp/myCollaborativeFilter"</span><span class="p">)</span> +<span class="c1"># Save and load model</span> +<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">"target/tmp/myCollaborativeFilter"</span><span class="p">)</span> +<span class="n">sameModel</span> <span class="o">=</span> <span class="n">MatrixFactorizationModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">"target/tmp/myCollaborativeFilter"</span><span class="p">)</span> </pre></div> <div><small>Find full example code at "examples/src/main/python/mllib/recommendation_example.py" in the Spark repo.</small></div> <p>If the rating matrix is derived from other source of information (i.e. it is inferred from other signals), you can use the trainImplicit method to get better results.</p> - <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># Build the recommendation model using Alternating Least Squares based on implicit ratings</span> -<span class="n">model</span> <span class="o">=</span> <span class="n">ALS</span><span class="o">.</span><span class="n">trainImplicit</span><span class="p">(</span><span class="n">ratings</span><span class="p">,</span> <span class="n">rank</span><span class="p">,</span> <span class="n">numIterations</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.01</span><span class="p">)</span></code></pre></div> + <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># Build the recommendation model using Alternating Least Squares based on implicit ratings</span> +<span class="n">model</span> <span class="o">=</span> <span class="n">ALS</span><span class="o">.</span><span class="n">trainImplicit</span><span class="p">(</span><span class="n">ratings</span><span class="p">,</span> <span class="n">rank</span><span class="p">,</span> <span class="n">numIterations</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.01</span><span class="p">)</span></code></pre></figure> </div>
--------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org