Both ideas are worth trying and evaluating. My intuition is that it is
still best to keep the most popular items over the less popular items.
Really, I would say, only consider trimming data on items that are so
unpopular that any data you have on them is likely noise. Keep data on
items that are merely "sort of" unpopular.

There are then other ways in the algorithm implementations to trade
space for speed or accuracy. But best to start with a lot of data in
general rather than trim a lot upfront.

Again I would just separate the concern about which data points to
keep from the concern about which items should be recommendable -- the
latter is what the Rescorer is for.

On Wed, Oct 22, 2008 at 6:43 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:with
> Because I don't want to recommend old news, I *think* I can chop of some of 
> the tail at some(?) expense of quality.
> Now that I see the distribution of items more clearly, I am also wondering if 
> feeding the most popular items into the recommendation engine is really 
> valuable.  Items are very popular because lots of people consumed them.  This 
> produces a lot of overlap between users, which is good, but maybe it's too 
> good for its own good (kind of like the Harry Potter problem)?  I wonder if 
> it would make sense not to include (and thus not recommend) the most popular 
> items?  Hm, doesn't sound right, because of my 705K users only about 98K have 
> seen the top 10 items already.  But would it make sense to artificially lower 
> their rating, to put a damper on them?

Reply via email to