[ https://issues.apache.org/jira/browse/SPARK-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895589#comment-15895589 ]
Daniel Li commented on SPARK-6407: ---------------------------------- Reviving this thread since I'm interested in implementing streaming CF for Spark. bq. Using ALS for online updates is expensive. Recomputing the factor matrices _U_ and _V_ from scratch for every update would be terribly expensive, but what about keeping _U_ and _V_ around and simply recomputing another round or two after each new rating that comes in? The algorithm would simply be continually following a moving optimum. I can't imagine the RMSE changing much due to small updates if we use a convergence threshold _à la_ [Y. Zhou, et al., “Large-Scale Parallel Collaborative Filtering for the Netflix Prize”|http://dl.acm.org/citation.cfm?id=1424269] instead of a fixed number of iterations. (In fact, since calculating _(U^T) * V_ would probably take a nontrivial slice of time, new updates that come in during a round of calculation could be "batched" into the next round of calculation, increasing efficiency.) Thoughts? > Streaming ALS for Collaborative Filtering > ----------------------------------------- > > Key: SPARK-6407 > URL: https://issues.apache.org/jira/browse/SPARK-6407 > Project: Spark > Issue Type: New Feature > Components: DStreams > Reporter: Felix Cheung > Priority: Minor > > Like MLLib's ALS implementation for recommendation, and applying to streaming. > Similar to streaming linear regression, logistic regression, could we apply > gradient updates to batches of data and reuse existing MLLib implementation? -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org