[ 
https://issues.apache.org/jira/browse/SPARK-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481711#comment-14481711
 ] 

Xiangrui Meng commented on SPARK-6407:
--------------------------------------

Attached the comment from Chunnan Yao in SPARK-6711:

On-line Collaborative Filtering(CF) has been widely used and studied. To 
re-train a CF model from scratch every time when new data comes in is very 
inefficient 
(http://stackoverflow.com/questions/27734329/apache-spark-incremental-training-of-als-model).
 However, in Spark community we see few discussion about collaborative 
filtering on streaming data. Given streaming k-means, streaming logistic 
regression, and the on-going incremental model training of Naive Bayes 
Classifier (SPARK-4144), we think it is meaningful to consider streaming 
Collaborative Filtering support on MLlib. 

We have already been considering about this issue during the past week. We plan 
to refer to this paper
(https://www.cs.utexas.edu/~cjohnson/ParallelCollabFilt.pdf). It is based on 
SGD instead of ALS, which is easier to be tackled under streaming data. 

Fortunately, the authors of this paper have implemented their algorithm as a 
Github Project, based on Storm:
https://github.com/MrChrisJohnson/CollabStream

> Streaming ALS for Collaborative Filtering
> -----------------------------------------
>
>                 Key: SPARK-6407
>                 URL: https://issues.apache.org/jira/browse/SPARK-6407
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Felix Cheung
>            Priority: Minor
>
> Like MLLib's ALS implementation for recommendation, and applying to streaming.
> Similar to streaming linear regression, logistic regression, could we apply 
> gradient updates to batches of data and reuse existing MLLib implementation?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to