Re: RowSimilarityJob improvements

Ted Dunning Sat, 13 Aug 2011 11:58:39 -0700

I have heard various arguments that favor retaining the most recent 
interactions or favor a fair sample or favor taking the earliest interactions. 
These can even be combined with biased samples. I haven't seen much difference 
between these approaches. I think at the lack of difference is largely due to 
the fact that the sampling falls most heavily on items that we care very little 
about in recommendations since they are the most popular items that are 
obviously getting plenty of traffic anyway.


Sent from my iPad

On Aug 13, 2011, at 2:31 AM, Sebastian Schelter <[email protected]> wrote:

> One thing I'm currently looking into is how to sample the input. Ted has 
> stated that you usually only need to look at a few hundred or thousand 
> ratings per item as you don't learn anything new from the rest. Would it be 
> sufficient to randomly sample the ratings of an item then? That's what I'm 
> currently doing but I wonder whether there are more clever ways to do this.

Re: RowSimilarityJob improvements

Reply via email to