On Fri, May 1, 2009 at 5:22 AM, Otis Gospodnetic
<[email protected]> wrote:
> Some feedback from my Taste experience.  Tanimoto was the bottleneck for me, 
> too.  I used the highly sophisticated kill -QUIT pid method to determine 
> that.  Such kills always caught Taste in Tanimoto part of the code.

Yeah, er, the correlation is certainly consuming most of the time in
this scenario. Tanimoto should now be slow*er* the cosine measure
though.

By default the user neighborhood component searches among all users
for the closest neighbors. That's a lot of similarities to compute,
and why it might be better to just draw from a sample of all users.

> Do you know, roughly, what that nontrivial amount might be? e.g. 10% or more?

It really depends on the nature of the data and what tradeoff you want
to make. I have not studied this in detail. Anecdotally, on a
large-ish data set you can ignore most users and still end up with an
OK neighborhood.

Actually I should do a bit of math to get an analytical result on
this, let me do that.


> Also, does the "nearly instantaneous" refer to calling Taste with a single 
> recommend request at a time?  I'm asking because I recently did some heavy 
> duty benchmarking and things were definitely not instantaneous when I 
> increased the number of concurrent requests.  To make things fast (e.g. under 
> 100 ms avg.) and run in reasonable amount of memory, I had to resort to 
> remove-noise-users-and-items-from-input-and-then-read-the-data-model.... 
> which means users who look like noise to the system (and that's a lot of them 
> in order to keep things fast and limit memory usage) will not get 
> recommendations.

I suppose I just meant compared to loading the entire DataModel. It
should have been in the hundreds of milliseconds compared to a good 30
seconds.

One recent benchmark I can offer is that on a chunky machine (8 core
@2GHz or so, 20GB RAM), using the 10M rating data set from GroupLens
and slope-one, recommendations are produced in about 400ms each. Not
terrible, but slow for real-time usage. Precomputing in some way seems
ideal.

Locally on my desktop (2 core @ 2.5GHz, 1GB heap) this sample code is
producing recommendations in about 550ms. If you go to 10% sampling
that goes to 350ms or so.

Concurrency: response time should be reasonably constant as long as
the number of concurrent requests is <= the number of cores. The
factor that might slow this down are caches in the code, which have a
bit of synchronization, and I have found that to be a minor
bottleneck. Obviously once you scale beyond the number of cores the
response time increases linearly with the number of concurrent
requests.

Reply via email to