Re: recommendations with duplicate ratings

2016-02-15 Thread Nick Pentreath
Yes, for implicit data you need to sum up the "ratings" (actually view them
as "weights") for each user-item pair. I do this is my ALS application.

For ecommerce, say a "view" event has a weight of 1.0 and a "purchase" a
weight of 3.0. Then adding multiple events together for a given user and
item makes sense.

ALS assumes an input ratings matrix (even though Spark's implementation
takes an RDD[Rating]), so the algorithm itself doesn't support duplicate
ratings.

On Mon, 15 Feb 2016 at 23:24, Sean Owen  wrote:

> You're asking what happens when you put many ratings for one user-item
> pair in the input, right? I'm saying you shouldn't do that --
> aggregate them into one pair in your application.
>
> For rating-like (explicit) data, it doesn't really make sense
> otherwise. The only sensible aggregation is last-first, but there's no
> natural notion of 'last' in the RDD you supply.
>
> For count-like (implicit) data, it makes sense to sum the inputs, but
> I don't think that is done automatically. I skimmed the code and
> didn't see it. So you would sum the values per user-item anyway.
>
> On Mon, Feb 15, 2016 at 9:05 PM, Roberto Pagliari
>  wrote:
> > Hi Sean,
> > I¹m not sure what you mean by aggregate. The input of trainImplicit is an
> > RDD of Ratings.
> >
> > I find it odd that duplicate ratings would mess with ALS in the implicit
> > case. It¹d be nice if it didn¹t.
> >
> >
> > Thank you,
> >
> > On 15/02/2016 20:49, "Sean Owen"  wrote:
> >
> >>I believe you need to aggregate inputs per user-item in your call. I
> >>am actually not sure what happens if you don't. I think it would
> >>compute the factors twice and one would win, so yes I think it would
> >>effectively be ignored.  For implicit, that wouldn't work correctly,
> >>so you do need to aggregate.
> >>
> >>On Mon, Feb 15, 2016 at 8:30 PM, Roberto Pagliari
> >> wrote:
> >>> What happens when duplicate user/ratings are fed into ALS (the implicit
> >>> version, specifically)? Are duplicates ignored?
> >>>
> >>> I¹m asking because that would save me a distinct.
> >>>
> >>>
> >>>
> >>> Thank you,
> >>>
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: recommendations with duplicate ratings

2016-02-15 Thread Sean Owen
You're asking what happens when you put many ratings for one user-item
pair in the input, right? I'm saying you shouldn't do that --
aggregate them into one pair in your application.

For rating-like (explicit) data, it doesn't really make sense
otherwise. The only sensible aggregation is last-first, but there's no
natural notion of 'last' in the RDD you supply.

For count-like (implicit) data, it makes sense to sum the inputs, but
I don't think that is done automatically. I skimmed the code and
didn't see it. So you would sum the values per user-item anyway.

On Mon, Feb 15, 2016 at 9:05 PM, Roberto Pagliari
 wrote:
> Hi Sean,
> I¹m not sure what you mean by aggregate. The input of trainImplicit is an
> RDD of Ratings.
>
> I find it odd that duplicate ratings would mess with ALS in the implicit
> case. It¹d be nice if it didn¹t.
>
>
> Thank you,
>
> On 15/02/2016 20:49, "Sean Owen"  wrote:
>
>>I believe you need to aggregate inputs per user-item in your call. I
>>am actually not sure what happens if you don't. I think it would
>>compute the factors twice and one would win, so yes I think it would
>>effectively be ignored.  For implicit, that wouldn't work correctly,
>>so you do need to aggregate.
>>
>>On Mon, Feb 15, 2016 at 8:30 PM, Roberto Pagliari
>> wrote:
>>> What happens when duplicate user/ratings are fed into ALS (the implicit
>>> version, specifically)? Are duplicates ignored?
>>>
>>> I¹m asking because that would save me a distinct.
>>>
>>>
>>>
>>> Thank you,
>>>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: recommendations with duplicate ratings

2016-02-15 Thread Roberto Pagliari
Hi Sean,
I¹m not sure what you mean by aggregate. The input of trainImplicit is an
RDD of Ratings. 

I find it odd that duplicate ratings would mess with ALS in the implicit
case. It¹d be nice if it didn¹t.


Thank you, 

On 15/02/2016 20:49, "Sean Owen"  wrote:

>I believe you need to aggregate inputs per user-item in your call. I
>am actually not sure what happens if you don't. I think it would
>compute the factors twice and one would win, so yes I think it would
>effectively be ignored.  For implicit, that wouldn't work correctly,
>so you do need to aggregate.
>
>On Mon, Feb 15, 2016 at 8:30 PM, Roberto Pagliari
> wrote:
>> What happens when duplicate user/ratings are fed into ALS (the implicit
>> version, specifically)? Are duplicates ignored?
>>
>> I¹m asking because that would save me a distinct.
>>
>>
>>
>> Thank you,
>>


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: recommendations with duplicate ratings

2016-02-15 Thread Sean Owen
I believe you need to aggregate inputs per user-item in your call. I
am actually not sure what happens if you don't. I think it would
compute the factors twice and one would win, so yes I think it would
effectively be ignored.  For implicit, that wouldn't work correctly,
so you do need to aggregate.

On Mon, Feb 15, 2016 at 8:30 PM, Roberto Pagliari
 wrote:
> What happens when duplicate user/ratings are fed into ALS (the implicit
> version, specifically)? Are duplicates ignored?
>
> I’m asking because that would save me a distinct.
>
>
>
> Thank you,
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



recommendations with duplicate ratings

2016-02-15 Thread Roberto Pagliari
What happens when duplicate user/ratings are fed into ALS (the implicit 
version, specifically)? Are duplicates ignored?

I'm asking because that would save me a distinct.



Thank you,