If you are using any of the 'samplingRate' parameters, then down in
the code it is using a random number generator to select some subset
of things to look at. That means you could get different results, due
to different neighborhoods, etc. on each request.

Is it bad behavior? Well:

1) If sampling rates aren't too low, the results shouldn't be very
different, even if they are not identical. So one conclusion could be
sampling is having too large an effect and the rate needs to go up

2) The assumption is that any of the slightly different results you
may get are about equally 'good' anyway

3) I suppose I think of computing recommendation as a
relatively-speaking infrequent event. You might compute them once a
day or hour. Or you compute on the fly and cache it, either externally
or in the framework. So, it shouldn't be the case that the same
recommendations are computed over and over in a row, where the
differences might become noticeable, in an application, to a user


Is it possible to guarantee the same recommendation, even when using
sampling, if the data doesn't change? wouldn't be too hard to always
use a local RNG and always seed it the same way, no. It would be a
performance hit.

My first reaction though is #3 -- cache. Is that a feasible response?


Sean



On Wed, Jun 3, 2009 at 8:29 PM, Otis Gospodnetic
<[email protected]> wrote:
> Hello,
>
> I haven't debugged this yet, but I was playing with sampling rate in Taste 
> and noticed a weird behaviour where the recommender doesn't give consistent 
> results -- when it gives them they are always the same, but sometimes it 
> doesn't give them.  For example:
>
>
> $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'
> a1
> a2
> $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'
> a1
> a2
> $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'    -- no 
> recommendations from this call!
> $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'
> a1
> a2
> $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'    -- no 
> recommendations from this call!
> $ curl 'http://localhost:8080/re/recommend?userID=u4&howMany=10'
> a1
> a2
>
> Another way to see this is if I use different sampling rates and collect 
> output, like this:
> $ for x in `seq 1 1000`; do curl --silent 
> 'http://localhost:8080/re/recommend?userID=u4&howMany=10'; done > (output 
> file here)
>
> I get this:
>
> -rw-r--r-- 1 otis otis 5994 2009-06-03 15:24 out-1-sr0.8
> -rw-r--r-- 1 otis otis 5988 2009-06-03 15:24 out-2-sr0.8    -- different 
> outputs!
>
> -rw-r--r-- 1 otis otis 6000 2009-06-03 15:23 out-1-sr0.9
> -rw-r--r-- 1 otis otis 6000 2009-06-03 15:23 out-2-sr0.9
>
> -rw-r--r-- 1 otis otis 6000 2009-06-03 15:22 out-1-sr0.99
> -rw-r--r-- 1 otis otis 6000 2009-06-03 15:22 out-2-sr0.99
>
> -rw-r--r-- 1 otis otis 6000 2009-06-03 15:20 out-1-sr1.0
> -rw-r--r-- 1 otis otis 6000 2009-06-03 15:21 out-2-sr1.0
>
> If this worked consistently, the outputs should be identical, no?
>
> This doesn't look normal...bug?
> I'm attaching my sample input (but ML software may strip it).
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>

Reply via email to