On Fri, May 1, 2009 at 5:22 AM, Otis Gospodnetic <[email protected]> wrote: > Some feedback from my Taste experience. Tanimoto was the bottleneck for me, > too. I used the highly sophisticated kill -QUIT pid method to determine > that. Such kills always caught Taste in Tanimoto part of the code.
Yeah, er, the correlation is certainly consuming most of the time in this scenario. Tanimoto should now be slow*er* the cosine measure though. By default the user neighborhood component searches among all users for the closest neighbors. That's a lot of similarities to compute, and why it might be better to just draw from a sample of all users. > Do you know, roughly, what that nontrivial amount might be? e.g. 10% or more? It really depends on the nature of the data and what tradeoff you want to make. I have not studied this in detail. Anecdotally, on a large-ish data set you can ignore most users and still end up with an OK neighborhood. Actually I should do a bit of math to get an analytical result on this, let me do that. > Also, does the "nearly instantaneous" refer to calling Taste with a single > recommend request at a time? I'm asking because I recently did some heavy > duty benchmarking and things were definitely not instantaneous when I > increased the number of concurrent requests. To make things fast (e.g. under > 100 ms avg.) and run in reasonable amount of memory, I had to resort to > remove-noise-users-and-items-from-input-and-then-read-the-data-model.... > which means users who look like noise to the system (and that's a lot of them > in order to keep things fast and limit memory usage) will not get > recommendations. I suppose I just meant compared to loading the entire DataModel. It should have been in the hundreds of milliseconds compared to a good 30 seconds. One recent benchmark I can offer is that on a chunky machine (8 core @2GHz or so, 20GB RAM), using the 10M rating data set from GroupLens and slope-one, recommendations are produced in about 400ms each. Not terrible, but slow for real-time usage. Precomputing in some way seems ideal. Locally on my desktop (2 core @ 2.5GHz, 1GB heap) this sample code is producing recommendations in about 550ms. If you go to 10% sampling that goes to 350ms or so. Concurrency: response time should be reasonably constant as long as the number of concurrent requests is <= the number of cores. The factor that might slow this down are caches in the code, which have a bit of synchronization, and I have found that to be a minor bottleneck. Obviously once you scale beyond the number of cores the response time increases linearly with the number of concurrent requests.
