Chunnan Yao created SPARK-6711:
----------------------------------

             Summary: Support parallelized online matrix factorization for 
Collaborative Filtering 
                 Key: SPARK-6711
                 URL: https://issues.apache.org/jira/browse/SPARK-6711
             Project: Spark
          Issue Type: Improvement
          Components: MLlib, Streaming
            Reporter: Chunnan Yao


On-line Collaborative Filtering(CF) has been widely used and studied. To 
re-train a CF model from scratch every time when new data comes in is very 
inefficient 
(http://stackoverflow.com/questions/27734329/apache-spark-incremental-training-of-als-model).
 However, in Spark community we see few discussion about collaborative 
filtering on streaming data. Given streaming k-means, streaming logistic 
regression, and the on-going incremental model training of Naive Bayes 
Classifier (SPARK-4144), we think it is meaningful to consider streaming 
Collaborative Filtering support on MLlib. 

We have already been considering about this issue during the past week. We plan 
to refer to this paper
(https://www.cs.utexas.edu/~cjohnson/ParallelCollabFilt.pdf). It is based on 
SGD instead of ALS, which is easier to be tackled under streaming data. 

Fortunately, the authors of this paper have implemented their algorithm as a 
Github Project, based on Storm:
https://github.com/MrChrisJohnson/CollabStream




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to