Re: Is it possible to do incremental training using ALSModel (MLlib)?
Hi, do you have any updates -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-do-incremental-training-using-ALSModel-MLlib-tp20942p22296.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Is it possible to do incremental training using ALSModel (MLlib)?
You’re right, Nick! This function does exactly that. Sean has already helped me greatly. Thanks for your reply. Wouter Samaey Zaakvoerder Storefront BVBA Tel: +32 472 72 83 07 Web: http://storefront.be LinkedIn: http://www.linkedin.com/in/woutersamaey > On 07 Jan 2015, at 11:08, Nick Pentreath wrote: > > As I recall Oryx (the old version, and I assume the new one too) provide > something like this: > http://cloudera.github.io/oryx/apidocs/com/cloudera/oryx/als/common/OryxRecommender.html#recommendToAnonymous-java.lang.String:A-float:A-int- > > <http://cloudera.github.io/oryx/apidocs/com/cloudera/oryx/als/common/OryxRecommender.html#recommendToAnonymous-java.lang.String:A-float:A-int-> > > though Sean will be more on top of that than me :) > > On Mon, Jan 5, 2015 at 2:17 PM, Wouter Samaey <mailto:wouter.sam...@storefront.be>> wrote: > One other idea was that I don’t need to re-train the model, but simply pass > all the current user’s recent ratings (including one’s created after the > training) to the existing model… > > Is this a valid option? > > > > Wouter Samaey > Zaakvoerder Storefront BVBA > > Tel: +32 472 72 83 07 > Web: http://storefront.be <http://storefront.be/> > > LinkedIn: http://www.linkedin.com/in/woutersamaey > <http://www.linkedin.com/in/woutersamaey> > > > On 05 Jan 2015, at 13:13, Sean Owen > <mailto:so...@cloudera.com>> wrote: > > > > In the first instance, I'm suggesting that ALS in Spark could perhaps > > expose a run() method that accepts a previous > > MatrixFactorizationModel, and uses the product factors from it as the > > initial state instead. If anybody seconds that idea, I'll make a PR. > > > > The second idea is just fold-in: > > http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares/14 > > > > <http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares/14> > > > > Whether you do this or something like SGD, inside or outside Spark, > > depends on your requirements I think. > > > > On Sat, Jan 3, 2015 at 12:04 PM, Wouter Samaey > > mailto:wouter.sam...@storefront.be>> wrote: > >> Do you know a place where I could find a sample or tutorial for this? > >> > >> I'm still very new at this. And struggling a bit... > >> > >> Thanks in advance > >> > >> Wouter > >> > >> Sent from my iPhone. > >> > >> On 03 Jan 2015, at 10:36, Sean Owen >> <mailto:so...@cloudera.com>> wrote: > >> > >> Yes, it is easy to simply start a new factorization from the current model > >> solution. It works well. That's more like incremental *batch* rebuilding of > >> the model. That is not in MLlib but fairly trivial to add. > >> > >> You can certainly 'fold in' new data to approximately update with one new > >> datum too, which you can find online. This is not quite the same idea as > >> streaming SGD. I'm not sure this fits the RDD model well since it entails > >> updating one element at a time but mini batch could be reasonable. > >> > >> On Jan 3, 2015 5:29 AM, "Peng Cheng" >> <mailto:rhw...@gmail.com>> wrote: > >>> > >>> I was under the impression that ALS wasn't designed for it :-< The famous > >>> ebay online recommender uses SGD > >>> However, you can try using the previous model as starting point, and > >>> gradually reduce the number of iteration after the model stablize. I never > >>> verify this idea, so you need to at least cross-validate it before putting > >>> into productio > >>> > >>> On 2 January 2015 at 04:40, Wouter Samaey >>> <mailto:wouter.sam...@storefront.be>> > >>> wrote: > >>>> > >>>> Hi all, > >>>> > >>>> I'm curious about MLlib and if it is possible to do incremental training > >>>> on > >>>> the ALSModel. > >>>> > >>>> Usually training is run first, and then you can query. But in my case, > >>>> data > >>>> is collected in real-time and I want the predictions of my ALSModel to > >>>> consider the latest data without complete re-training phase. > >>>> > >>>> I've checked out these resources, but could not find any info on how to > >>>> solve this: > >>>> https://spark.apache.org/docs/lat
Re: Is it possible to do incremental training using ALSModel (MLlib)?
As I recall Oryx (the old version, and I assume the new one too) provide something like this: http://cloudera.github.io/oryx/apidocs/com/cloudera/oryx/als/common/OryxRecommender.html#recommendToAnonymous-java.lang.String:A-float:A-int- though Sean will be more on top of that than me :) On Mon, Jan 5, 2015 at 2:17 PM, Wouter Samaey wrote: > One other idea was that I don’t need to re-train the model, but simply > pass all the current user’s recent ratings (including one’s created after > the training) to the existing model… > > Is this a valid option? > > > > Wouter Samaey > Zaakvoerder Storefront BVBA > > Tel: +32 472 72 83 07 > Web: http://storefront.be > > LinkedIn: http://www.linkedin.com/in/woutersamaey > > > On 05 Jan 2015, at 13:13, Sean Owen wrote: > > > > In the first instance, I'm suggesting that ALS in Spark could perhaps > > expose a run() method that accepts a previous > > MatrixFactorizationModel, and uses the product factors from it as the > > initial state instead. If anybody seconds that idea, I'll make a PR. > > > > The second idea is just fold-in: > > > http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares/14 > > > > Whether you do this or something like SGD, inside or outside Spark, > > depends on your requirements I think. > > > > On Sat, Jan 3, 2015 at 12:04 PM, Wouter Samaey > > wrote: > >> Do you know a place where I could find a sample or tutorial for this? > >> > >> I'm still very new at this. And struggling a bit... > >> > >> Thanks in advance > >> > >> Wouter > >> > >> Sent from my iPhone. > >> > >> On 03 Jan 2015, at 10:36, Sean Owen wrote: > >> > >> Yes, it is easy to simply start a new factorization from the current > model > >> solution. It works well. That's more like incremental *batch* > rebuilding of > >> the model. That is not in MLlib but fairly trivial to add. > >> > >> You can certainly 'fold in' new data to approximately update with one > new > >> datum too, which you can find online. This is not quite the same idea as > >> streaming SGD. I'm not sure this fits the RDD model well since it > entails > >> updating one element at a time but mini batch could be reasonable. > >> > >> On Jan 3, 2015 5:29 AM, "Peng Cheng" wrote: > >>> > >>> I was under the impression that ALS wasn't designed for it :-< The > famous > >>> ebay online recommender uses SGD > >>> However, you can try using the previous model as starting point, and > >>> gradually reduce the number of iteration after the model stablize. I > never > >>> verify this idea, so you need to at least cross-validate it before > putting > >>> into productio > >>> > >>> On 2 January 2015 at 04:40, Wouter Samaey > > >>> wrote: > >>>> > >>>> Hi all, > >>>> > >>>> I'm curious about MLlib and if it is possible to do incremental > training > >>>> on > >>>> the ALSModel. > >>>> > >>>> Usually training is run first, and then you can query. But in my case, > >>>> data > >>>> is collected in real-time and I want the predictions of my ALSModel to > >>>> consider the latest data without complete re-training phase. > >>>> > >>>> I've checked out these resources, but could not find any info on how > to > >>>> solve this: > >>>> > https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html > >>>> > >>>> > http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html > >>>> > >>>> My question fits in a larger picture where I'm using Prediction IO, > and > >>>> this > >>>> in turn is based on Spark. > >>>> > >>>> Thanks in advance for any advice! > >>>> > >>>> Wouter > >>>> > >>>> > >>>> > >>>> -- > >>>> View this message in context: > >>>> > http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-do-incremental-training-using-ALSModel-MLlib-tp20942.html > >>>> Sent from the Apache Spark User List mailing list archive at > Nabble.com. > >>>> > >>>> - > >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >>>> For additional commands, e-mail: user-h...@spark.apache.org > >>>> > >>> > >> > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Is it possible to do incremental training using ALSModel (MLlib)?
One other idea was that I don’t need to re-train the model, but simply pass all the current user’s recent ratings (including one’s created after the training) to the existing model… Is this a valid option? Wouter Samaey Zaakvoerder Storefront BVBA Tel: +32 472 72 83 07 Web: http://storefront.be LinkedIn: http://www.linkedin.com/in/woutersamaey > On 05 Jan 2015, at 13:13, Sean Owen wrote: > > In the first instance, I'm suggesting that ALS in Spark could perhaps > expose a run() method that accepts a previous > MatrixFactorizationModel, and uses the product factors from it as the > initial state instead. If anybody seconds that idea, I'll make a PR. > > The second idea is just fold-in: > http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares/14 > > Whether you do this or something like SGD, inside or outside Spark, > depends on your requirements I think. > > On Sat, Jan 3, 2015 at 12:04 PM, Wouter Samaey > wrote: >> Do you know a place where I could find a sample or tutorial for this? >> >> I'm still very new at this. And struggling a bit... >> >> Thanks in advance >> >> Wouter >> >> Sent from my iPhone. >> >> On 03 Jan 2015, at 10:36, Sean Owen wrote: >> >> Yes, it is easy to simply start a new factorization from the current model >> solution. It works well. That's more like incremental *batch* rebuilding of >> the model. That is not in MLlib but fairly trivial to add. >> >> You can certainly 'fold in' new data to approximately update with one new >> datum too, which you can find online. This is not quite the same idea as >> streaming SGD. I'm not sure this fits the RDD model well since it entails >> updating one element at a time but mini batch could be reasonable. >> >> On Jan 3, 2015 5:29 AM, "Peng Cheng" wrote: >>> >>> I was under the impression that ALS wasn't designed for it :-< The famous >>> ebay online recommender uses SGD >>> However, you can try using the previous model as starting point, and >>> gradually reduce the number of iteration after the model stablize. I never >>> verify this idea, so you need to at least cross-validate it before putting >>> into productio >>> >>> On 2 January 2015 at 04:40, Wouter Samaey >>> wrote: >>>> >>>> Hi all, >>>> >>>> I'm curious about MLlib and if it is possible to do incremental training >>>> on >>>> the ALSModel. >>>> >>>> Usually training is run first, and then you can query. But in my case, >>>> data >>>> is collected in real-time and I want the predictions of my ALSModel to >>>> consider the latest data without complete re-training phase. >>>> >>>> I've checked out these resources, but could not find any info on how to >>>> solve this: >>>> https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html >>>> >>>> http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html >>>> >>>> My question fits in a larger picture where I'm using Prediction IO, and >>>> this >>>> in turn is based on Spark. >>>> >>>> Thanks in advance for any advice! >>>> >>>> Wouter >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-do-incremental-training-using-ALSModel-MLlib-tp20942.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> - >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>> >> - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Is it possible to do incremental training using ALSModel (MLlib)?
In the first instance, I'm suggesting that ALS in Spark could perhaps expose a run() method that accepts a previous MatrixFactorizationModel, and uses the product factors from it as the initial state instead. If anybody seconds that idea, I'll make a PR. The second idea is just fold-in: http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares/14 Whether you do this or something like SGD, inside or outside Spark, depends on your requirements I think. On Sat, Jan 3, 2015 at 12:04 PM, Wouter Samaey wrote: > Do you know a place where I could find a sample or tutorial for this? > > I'm still very new at this. And struggling a bit... > > Thanks in advance > > Wouter > > Sent from my iPhone. > > On 03 Jan 2015, at 10:36, Sean Owen wrote: > > Yes, it is easy to simply start a new factorization from the current model > solution. It works well. That's more like incremental *batch* rebuilding of > the model. That is not in MLlib but fairly trivial to add. > > You can certainly 'fold in' new data to approximately update with one new > datum too, which you can find online. This is not quite the same idea as > streaming SGD. I'm not sure this fits the RDD model well since it entails > updating one element at a time but mini batch could be reasonable. > > On Jan 3, 2015 5:29 AM, "Peng Cheng" wrote: >> >> I was under the impression that ALS wasn't designed for it :-< The famous >> ebay online recommender uses SGD >> However, you can try using the previous model as starting point, and >> gradually reduce the number of iteration after the model stablize. I never >> verify this idea, so you need to at least cross-validate it before putting >> into productio >> >> On 2 January 2015 at 04:40, Wouter Samaey >> wrote: >>> >>> Hi all, >>> >>> I'm curious about MLlib and if it is possible to do incremental training >>> on >>> the ALSModel. >>> >>> Usually training is run first, and then you can query. But in my case, >>> data >>> is collected in real-time and I want the predictions of my ALSModel to >>> consider the latest data without complete re-training phase. >>> >>> I've checked out these resources, but could not find any info on how to >>> solve this: >>> https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html >>> >>> http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html >>> >>> My question fits in a larger picture where I'm using Prediction IO, and >>> this >>> in turn is based on Spark. >>> >>> Thanks in advance for any advice! >>> >>> Wouter >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-do-incremental-training-using-ALSModel-MLlib-tp20942.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >> > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Is it possible to do incremental training using ALSModel (MLlib)?
Do you know a place where I could find a sample or tutorial for this? I'm still very new at this. And struggling a bit... Thanks in advance Wouter Sent from my iPhone. > On 03 Jan 2015, at 10:36, Sean Owen wrote: > > Yes, it is easy to simply start a new factorization from the current model > solution. It works well. That's more like incremental *batch* rebuilding of > the model. That is not in MLlib but fairly trivial to add. > > You can certainly 'fold in' new data to approximately update with one new > datum too, which you can find online. This is not quite the same idea as > streaming SGD. I'm not sure this fits the RDD model well since it entails > updating one element at a time but mini batch could be reasonable. > >> On Jan 3, 2015 5:29 AM, "Peng Cheng" wrote: >> I was under the impression that ALS wasn't designed for it :-< The famous >> ebay online recommender uses SGD >> However, you can try using the previous model as starting point, and >> gradually reduce the number of iteration after the model stablize. I never >> verify this idea, so you need to at least cross-validate it before putting >> into productio >> >>> On 2 January 2015 at 04:40, Wouter Samaey >>> wrote: >>> Hi all, >>> >>> I'm curious about MLlib and if it is possible to do incremental training on >>> the ALSModel. >>> >>> Usually training is run first, and then you can query. But in my case, data >>> is collected in real-time and I want the predictions of my ALSModel to >>> consider the latest data without complete re-training phase. >>> >>> I've checked out these resources, but could not find any info on how to >>> solve this: >>> https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html >>> http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html >>> >>> My question fits in a larger picture where I'm using Prediction IO, and this >>> in turn is based on Spark. >>> >>> Thanks in advance for any advice! >>> >>> Wouter >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-do-incremental-training-using-ALSModel-MLlib-tp20942.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org
Re: Is it possible to do incremental training using ALSModel (MLlib)?
Yes, it is easy to simply start a new factorization from the current model solution. It works well. That's more like incremental *batch* rebuilding of the model. That is not in MLlib but fairly trivial to add. You can certainly 'fold in' new data to approximately update with one new datum too, which you can find online. This is not quite the same idea as streaming SGD. I'm not sure this fits the RDD model well since it entails updating one element at a time but mini batch could be reasonable. On Jan 3, 2015 5:29 AM, "Peng Cheng" wrote: > I was under the impression that ALS wasn't designed for it :-< The famous > ebay online recommender uses SGD > However, you can try using the previous model as starting point, and > gradually reduce the number of iteration after the model stablize. I never > verify this idea, so you need to at least cross-validate it before putting > into productio > > On 2 January 2015 at 04:40, Wouter Samaey > wrote: > >> Hi all, >> >> I'm curious about MLlib and if it is possible to do incremental training >> on >> the ALSModel. >> >> Usually training is run first, and then you can query. But in my case, >> data >> is collected in real-time and I want the predictions of my ALSModel to >> consider the latest data without complete re-training phase. >> >> I've checked out these resources, but could not find any info on how to >> solve this: >> https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html >> >> http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html >> >> My question fits in a larger picture where I'm using Prediction IO, and >> this >> in turn is based on Spark. >> >> Thanks in advance for any advice! >> >> Wouter >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-do-incremental-training-using-ALSModel-MLlib-tp20942.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: Is it possible to do incremental training using ALSModel (MLlib)?
There is a JIRA for it: https://issues.apache.org/jira/browse/SPARK-4981 On Fri, Jan 2, 2015 at 8:28 PM, Peng Cheng wrote: > I was under the impression that ALS wasn't designed for it :-< The famous > ebay online recommender uses SGD > However, you can try using the previous model as starting point, and > gradually reduce the number of iteration after the model stablize. I never > verify this idea, so you need to at least cross-validate it before putting > into productio > > On 2 January 2015 at 04:40, Wouter Samaey > wrote: > >> Hi all, >> >> I'm curious about MLlib and if it is possible to do incremental training >> on >> the ALSModel. >> >> Usually training is run first, and then you can query. But in my case, >> data >> is collected in real-time and I want the predictions of my ALSModel to >> consider the latest data without complete re-training phase. >> >> I've checked out these resources, but could not find any info on how to >> solve this: >> https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html >> >> http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html >> >> My question fits in a larger picture where I'm using Prediction IO, and >> this >> in turn is based on Spark. >> >> Thanks in advance for any advice! >> >> Wouter >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-do-incremental-training-using-ALSModel-MLlib-tp20942.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: Is it possible to do incremental training using ALSModel (MLlib)?
I was under the impression that ALS wasn't designed for it :-< The famous ebay online recommender uses SGD However, you can try using the previous model as starting point, and gradually reduce the number of iteration after the model stablize. I never verify this idea, so you need to at least cross-validate it before putting into productio On 2 January 2015 at 04:40, Wouter Samaey wrote: > Hi all, > > I'm curious about MLlib and if it is possible to do incremental training on > the ALSModel. > > Usually training is run first, and then you can query. But in my case, data > is collected in real-time and I want the predictions of my ALSModel to > consider the latest data without complete re-training phase. > > I've checked out these resources, but could not find any info on how to > solve this: > https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html > > http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html > > My question fits in a larger picture where I'm using Prediction IO, and > this > in turn is based on Spark. > > Thanks in advance for any advice! > > Wouter > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-do-incremental-training-using-ALSModel-MLlib-tp20942.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Is it possible to do incremental training using ALSModel (MLlib)?
Hi all, I'm curious about MLlib and if it is possible to do incremental training on the ALSModel. Usually training is run first, and then you can query. But in my case, data is collected in real-time and I want the predictions of my ALSModel to consider the latest data without complete re-training phase. I've checked out these resources, but could not find any info on how to solve this: https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html My question fits in a larger picture where I'm using Prediction IO, and this in turn is based on Spark. Thanks in advance for any advice! Wouter -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-possible-to-do-incremental-training-using-ALSModel-MLlib-tp20942.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org