[ https://issues.apache.org/jira/browse/SPARK-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896033#comment-15896033 ]
Daniel Li commented on SPARK-6407: ---------------------------------- Appreciate the quick reply, [~srowen]. Yeah, we'd be recomputing them, but not from scratch since we'd be starting with optimized _U_ and _V_. It would likely take only one or two iterations before reconvergence. Would this still be considered too expensive? The thing I hesitate about regarding fold-in updating is that the assumption that only the corresponding user row and item row will change may be too simplifying (since, of course, there's a "rippling out" effect—all items the user rated previous need to be updated, then all users that rated any of those items would need updating, etc.). Then again, even if we take this rippling into account the computation may not be too expensive, since a single update likely won't affect the RMSE enough to delay convergence. (Though I haven't worked out the math showing this; it's just a hunch.) Do you have any insights into this? > Streaming ALS for Collaborative Filtering > ----------------------------------------- > > Key: SPARK-6407 > URL: https://issues.apache.org/jira/browse/SPARK-6407 > Project: Spark > Issue Type: New Feature > Components: DStreams > Reporter: Felix Cheung > Priority: Minor > > Like MLLib's ALS implementation for recommendation, and applying to streaming. > Similar to streaming linear regression, logistic regression, could we apply > gradient updates to batches of data and reuse existing MLLib implementation? -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org