[ 
https://issues.apache.org/jira/browse/SPARK-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng closed SPARK-6711.
--------------------------------
    Resolution: Duplicate

> Support parallelized online matrix factorization for Collaborative Filtering 
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-6711
>                 URL: https://issues.apache.org/jira/browse/SPARK-6711
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib, Streaming
>            Reporter: Chunnan Yao
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> On-line Collaborative Filtering(CF) has been widely used and studied. To 
> re-train a CF model from scratch every time when new data comes in is very 
> inefficient 
> (http://stackoverflow.com/questions/27734329/apache-spark-incremental-training-of-als-model).
>  However, in Spark community we see few discussion about collaborative 
> filtering on streaming data. Given streaming k-means, streaming logistic 
> regression, and the on-going incremental model training of Naive Bayes 
> Classifier (SPARK-4144), we think it is meaningful to consider streaming 
> Collaborative Filtering support on MLlib. 
> We have already been considering about this issue during the past week. We 
> plan to refer to this paper
> (https://www.cs.utexas.edu/~cjohnson/ParallelCollabFilt.pdf). It is based on 
> SGD instead of ALS, which is easier to be tackled under streaming data. 
> Fortunately, the authors of this paper have implemented their algorithm as a 
> Github Project, based on Storm:
> https://github.com/MrChrisJohnson/CollabStream



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to