This is being discussed in https://issues.apache.org/jira/browse/SPARK-6407. Let's move the discussion there. Thanks for providing references! -Xiangrui
On Sun, Apr 5, 2015 at 11:48 PM, Chunnan Yao <yaochun...@gmail.com> wrote: > On-line Collaborative Filtering(CF) has been widely used and studied. To > re-train a CF model from scratch every time when new data comes in is very > inefficient > (http://stackoverflow.com/questions/27734329/apache-spark-incremental-training-of-als-model). > However, in Spark community we see few discussion about collaborative > filtering on streaming data. Given streaming k-means, streaming logistic > regression, and the on-going incremental model training of Naive Bayes > Classifier (SPARK-4144), we think it is meaningful to consider streaming > Collaborative Filtering support on MLlib. > > I've created an issue on JIRA (SPARK-6711) for possible discussions. We > suggest to refer to this paper > (https://www.cs.utexas.edu/~cjohnson/ParallelCollabFilt.pdf). It is based on > SGD instead of ALS, which is easier to be tackled under streaming data. > > Fortunately, the authors of this paper have implemented their algorithm as a > Github Project, based on Storm: > https://github.com/MrChrisJohnson/CollabStream > > Please don't hesitate to give your opinions on this issue and our planned > approach. We'd like to work on this in the next few weeks. > > > > ----- > Feel the sparking Spark! > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Support-parallelized-online-matrix-factorization-for-Collaborative-Filtering-tp11413.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org