Re: Recommenders and MABs

2016-09-21 Thread Dmitriy Lyubimov
there's been a great blog on that somewhere on richrelevance blog... But i
have a vague feeling based on what you are saying it may be all old news to
you...

[1] http://engineering.richrelevance.com/bandits-recommendation-systems/
and there's more in the series

On Sat, Sep 17, 2016 at 3:10 PM, Pat Ferrel  wrote:

> I’ve been thinking about how one would implement an application that only
> shows recommendations. This is partly because people want to build such
> things.
>
> There are many problems with this including cold start and overfit.
> However these problems also face MABs and are solved with sampling schemes.
> So imagine that you have several models from which to draw recommendations:
> 1) CF based recommender, 2) random recommendations, 3) popular recs (by
> some measure). If we look at each individual as facing an MAB with a
> sampling algo trained by them to pull recs from the 3 (or more) arms. This
> implies an MAB per user.
>
> The very first visit to the application would randomly draw from the
> choices and since there is no user data the recs engine would have to be
> able to respond (perhaps with random recs) the same would have to be true
> of the popular model (returning random), and random is always happy. The
> problem with this is that none of the arms are completely independent and
> the model driving each arm will change over time.
>
> The first time a user visits will result in a new MAB for them and will
> randomly draw from all arms but may get better responses from popular (with
> no user specific data yet in the system for cf). So the sampling will start
> to favor popular but will still explore other methods. When enough data is
> accumulated to start making good recs, the recommender will start to
> outperform popular and will get more of the user’s reinforcement.
>
> This seems to work with several unanswered questions and one problem to
> avoid—overfit. We would need a sampling method that would never fully
> converge or the user would never get a chance to show their
> expanding/changing preferences. The cf recommender will also overfit if
> non-cf items are not mixed in. Of the sampling methods I’ve seen for MABs,
> Greedy will not work but even  with some form of Bayesian/Thompson sampling
> the question is how to parameterize the sampling. With too little
> convergence we get sub-optimal exploit but we get the same with too much
> convergence and this will also overfit the cf recs.
>
> I imagine we could train a meta-model on the mature explore amount by
> trying different parameterization and finding if there is one answer for
> all or we could resort to heuristic rules—even business rules.
>
> If anyone has read this far, any ideas or comments?


Recommenders and MABs

2016-09-17 Thread Pat Ferrel
I’ve been thinking about how one would implement an application that only shows 
recommendations. This is partly because people want to build such things.

There are many problems with this including cold start and overfit. However 
these problems also face MABs and are solved with sampling schemes. So imagine 
that you have several models from which to draw recommendations: 1) CF based 
recommender, 2) random recommendations, 3) popular recs (by some measure). If 
we look at each individual as facing an MAB with a sampling algo trained by 
them to pull recs from the 3 (or more) arms. This implies an MAB per user. 

The very first visit to the application would randomly draw from the choices 
and since there is no user data the recs engine would have to be able to 
respond (perhaps with random recs) the same would have to be true of the 
popular model (returning random), and random is always happy. The problem with 
this is that none of the arms are completely independent and the model driving 
each arm will change over time.

The first time a user visits will result in a new MAB for them and will 
randomly draw from all arms but may get better responses from popular (with no 
user specific data yet in the system for cf). So the sampling will start to 
favor popular but will still explore other methods. When enough data is 
accumulated to start making good recs, the recommender will start to outperform 
popular and will get more of the user’s reinforcement.

This seems to work with several unanswered questions and one problem to 
avoid—overfit. We would need a sampling method that would never fully converge 
or the user would never get a chance to show their expanding/changing 
preferences. The cf recommender will also overfit if non-cf items are not mixed 
in. Of the sampling methods I’ve seen for MABs, Greedy will not work but even  
with some form of Bayesian/Thompson sampling the question is how to 
parameterize the sampling. With too little convergence we get sub-optimal 
exploit but we get the same with too much convergence and this will also 
overfit the cf recs. 

I imagine we could train a meta-model on the mature explore amount by trying 
different parameterization and finding if there is one answer for all or we 
could resort to heuristic rules—even business rules. 

If anyone has read this far, any ideas or comments?