[jira] [Commented] (SPARK-6407) Streaming ALS for Collaborative Filtering

Daniel Li (JIRA) Sat, 04 Mar 2017 00:37:06 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895589#comment-15895589
 ]


Daniel Li commented on SPARK-6407:
----------------------------------

Reviving this thread since I'm interested in implementing streaming CF for 
Spark.

bq. Using ALS for online updates is expensive.

Recomputing the factor matrices _U_ and _V_ from scratch for every update would 
be terribly expensive, but what about keeping _U_ and _V_ around and simply 
recomputing another round or two after each new rating that comes in?  The 
algorithm would simply be continually following a moving optimum.  I can't 
imagine the RMSE changing much due to small updates if we use a convergence 
threshold _à la_ [Y. Zhou, et al., “Large-Scale Parallel Collaborative 
Filtering for the Netflix Prize”|http://dl.acm.org/citation.cfm?id=1424269] 
instead of a fixed number of iterations.

(In fact, since calculating _(U^T) * V_ would probably take a nontrivial slice 
of time, new updates that come in during a round of calculation could be 
"batched" into the next round of calculation, increasing efficiency.)

Thoughts?

> Streaming ALS for Collaborative Filtering
> -----------------------------------------
>
>                 Key: SPARK-6407
>                 URL: https://issues.apache.org/jira/browse/SPARK-6407
>             Project: Spark
>          Issue Type: New Feature
>          Components: DStreams
>            Reporter: Felix Cheung
>            Priority: Minor
>
> Like MLLib's ALS implementation for recommendation, and applying to streaming.
> Similar to streaming linear regression, logistic regression, could we apply 
> gradient updates to batches of data and reuse existing MLLib implementation?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6407) Streaming ALS for Collaborative Filtering

Reply via email to