[ 
https://issues.apache.org/jira/browse/SPARK-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896033#comment-15896033
 ] 

Daniel Li commented on SPARK-6407:
----------------------------------

Appreciate the quick reply, [~srowen].

Yeah, we'd be recomputing them, but not from scratch since we'd be starting 
with optimized _U_ and _V_.  It would likely take only one or two iterations 
before reconvergence.  Would this still be considered too expensive?

The thing I hesitate about regarding fold-in updating is that the assumption 
that only the corresponding user row and item row will change may be too 
simplifying (since, of course, there's a "rippling out" effect—all items the 
user rated previous need to be updated, then all users that rated any of those 
items would need updating, etc.).  Then again, even if we take this rippling 
into account the computation may not be too expensive, since a single update 
likely won't affect the RMSE enough to delay convergence.  (Though I haven't 
worked out the math showing this; it's just a hunch.)

Do you have any insights into this?

> Streaming ALS for Collaborative Filtering
> -----------------------------------------
>
>                 Key: SPARK-6407
>                 URL: https://issues.apache.org/jira/browse/SPARK-6407
>             Project: Spark
>          Issue Type: New Feature
>          Components: DStreams
>            Reporter: Felix Cheung
>            Priority: Minor
>
> Like MLLib's ALS implementation for recommendation, and applying to streaming.
> Similar to streaming linear regression, logistic regression, could we apply 
> gradient updates to batches of data and reuse existing MLLib implementation?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to