In the first instance, I'm suggesting that ALS in Spark could perhaps expose a run() method that accepts a previous MatrixFactorizationModel, and uses the product factors from it as the initial state instead. If anybody seconds that idea, I'll make a PR.
The second idea is just fold-in: http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares/14 Whether you do this or something like SGD, inside or outside Spark, depends on your requirements I think. On Sat, Jan 3, 2015 at 12:04 PM, Wouter Samaey <wouter.sam...@storefront.be> wrote: > Do you know a place where I could find a sample or tutorial for this? > > I'm still very new at this. And struggling a bit... > > Thanks in advance > > Wouter > > Sent from my iPhone. > > On 03 Jan 2015, at 10:36, Sean Owen <so...@cloudera.com> wrote: > > Yes, it is easy to simply start a new factorization from the current model > solution. It works well. That's more like incremental *batch* rebuilding of > the model. That is not in MLlib but fairly trivial to add. > > You can certainly 'fold in' new data to approximately update with one new > datum too, which you can find online. This is not quite the same idea as > streaming SGD. I'm not sure this fits the RDD model well since it entails > updating one element at a time but mini batch could be reasonable. > > On Jan 3, 2015 5:29 AM, "Peng Cheng" <rhw...@gmail.com> wrote: >> >> I was under the impression that ALS wasn't designed for it :-< The famous >> ebay online recommender uses SGD >> However, you can try using the previous model as starting point, and >> gradually reduce the number of iteration after the model stablize. I never >> verify this idea, so you need to at least cross-validate it before putting >> into productio >> >> On 2 January 2015 at 04:40, Wouter Samaey <wouter.sam...@storefront.be> >> wrote: >>> >>> Hi all, >>> >>> I'm curious about MLlib and if it is possible to do incremental training >>> on >>> the ALSModel. >>> >>> Usually training is run first, and then you can query. But in my case, >>> data >>> is collected in real-time and I want the predictions of my ALSModel to >>> consider the latest data without complete re-training phase. >>> >>> I've checked out these resources, but could not find any info on how to >>> solve this: >>> https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html >>> >>> http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html >>> >>> My question fits in a larger picture where I'm using Prediction IO, and >>> this >>> in turn is based on Spark. >>> >>> Thanks in advance for any advice! >>> >>> Wouter >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-do-incremental-training-using-ALSModel-MLlib-tp20942.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org