Re: recommendations with duplicate ratings
Yes, for implicit data you need to sum up the "ratings" (actually view them as "weights") for each user-item pair. I do this is my ALS application. For ecommerce, say a "view" event has a weight of 1.0 and a "purchase" a weight of 3.0. Then adding multiple events together for a given user and item makes sense. ALS assumes an input ratings matrix (even though Spark's implementation takes an RDD[Rating]), so the algorithm itself doesn't support duplicate ratings. On Mon, 15 Feb 2016 at 23:24, Sean Owen wrote: > You're asking what happens when you put many ratings for one user-item > pair in the input, right? I'm saying you shouldn't do that -- > aggregate them into one pair in your application. > > For rating-like (explicit) data, it doesn't really make sense > otherwise. The only sensible aggregation is last-first, but there's no > natural notion of 'last' in the RDD you supply. > > For count-like (implicit) data, it makes sense to sum the inputs, but > I don't think that is done automatically. I skimmed the code and > didn't see it. So you would sum the values per user-item anyway. > > On Mon, Feb 15, 2016 at 9:05 PM, Roberto Pagliari > wrote: > > Hi Sean, > > I¹m not sure what you mean by aggregate. The input of trainImplicit is an > > RDD of Ratings. > > > > I find it odd that duplicate ratings would mess with ALS in the implicit > > case. It¹d be nice if it didn¹t. > > > > > > Thank you, > > > > On 15/02/2016 20:49, "Sean Owen" wrote: > > > >>I believe you need to aggregate inputs per user-item in your call. I > >>am actually not sure what happens if you don't. I think it would > >>compute the factors twice and one would win, so yes I think it would > >>effectively be ignored. For implicit, that wouldn't work correctly, > >>so you do need to aggregate. > >> > >>On Mon, Feb 15, 2016 at 8:30 PM, Roberto Pagliari > >> wrote: > >>> What happens when duplicate user/ratings are fed into ALS (the implicit > >>> version, specifically)? Are duplicates ignored? > >>> > >>> I¹m asking because that would save me a distinct. > >>> > >>> > >>> > >>> Thank you, > >>> > > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: recommendations with duplicate ratings
You're asking what happens when you put many ratings for one user-item pair in the input, right? I'm saying you shouldn't do that -- aggregate them into one pair in your application. For rating-like (explicit) data, it doesn't really make sense otherwise. The only sensible aggregation is last-first, but there's no natural notion of 'last' in the RDD you supply. For count-like (implicit) data, it makes sense to sum the inputs, but I don't think that is done automatically. I skimmed the code and didn't see it. So you would sum the values per user-item anyway. On Mon, Feb 15, 2016 at 9:05 PM, Roberto Pagliari wrote: > Hi Sean, > I¹m not sure what you mean by aggregate. The input of trainImplicit is an > RDD of Ratings. > > I find it odd that duplicate ratings would mess with ALS in the implicit > case. It¹d be nice if it didn¹t. > > > Thank you, > > On 15/02/2016 20:49, "Sean Owen" wrote: > >>I believe you need to aggregate inputs per user-item in your call. I >>am actually not sure what happens if you don't. I think it would >>compute the factors twice and one would win, so yes I think it would >>effectively be ignored. For implicit, that wouldn't work correctly, >>so you do need to aggregate. >> >>On Mon, Feb 15, 2016 at 8:30 PM, Roberto Pagliari >> wrote: >>> What happens when duplicate user/ratings are fed into ALS (the implicit >>> version, specifically)? Are duplicates ignored? >>> >>> I¹m asking because that would save me a distinct. >>> >>> >>> >>> Thank you, >>> > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: recommendations with duplicate ratings
Hi Sean, I¹m not sure what you mean by aggregate. The input of trainImplicit is an RDD of Ratings. I find it odd that duplicate ratings would mess with ALS in the implicit case. It¹d be nice if it didn¹t. Thank you, On 15/02/2016 20:49, "Sean Owen" wrote: >I believe you need to aggregate inputs per user-item in your call. I >am actually not sure what happens if you don't. I think it would >compute the factors twice and one would win, so yes I think it would >effectively be ignored. For implicit, that wouldn't work correctly, >so you do need to aggregate. > >On Mon, Feb 15, 2016 at 8:30 PM, Roberto Pagliari > wrote: >> What happens when duplicate user/ratings are fed into ALS (the implicit >> version, specifically)? Are duplicates ignored? >> >> I¹m asking because that would save me a distinct. >> >> >> >> Thank you, >> - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: recommendations with duplicate ratings
I believe you need to aggregate inputs per user-item in your call. I am actually not sure what happens if you don't. I think it would compute the factors twice and one would win, so yes I think it would effectively be ignored. For implicit, that wouldn't work correctly, so you do need to aggregate. On Mon, Feb 15, 2016 at 8:30 PM, Roberto Pagliari wrote: > What happens when duplicate user/ratings are fed into ALS (the implicit > version, specifically)? Are duplicates ignored? > > I’m asking because that would save me a distinct. > > > > Thank you, > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
recommendations with duplicate ratings
What happens when duplicate user/ratings are fed into ALS (the implicit version, specifically)? Are duplicates ignored? I'm asking because that would save me a distinct. Thank you,