Hello,
I've been using Taste for a while, but it's not scaling well, and I suspect I'm
doing something wrong.
When I say "not scaling well", this is what I mean:
* I have 1 week's worth of data (user,item datapoints)
* I don't have item preferences, so I'm using the boolean model
* I have caching in front of Taste, so the rate of requests that Taste needs to
handle is only 150-300 reqs/minute/server
* The server is an 8-core 2.5GHz 32-bit machine with 32 GB of RAM
* I use 2GB heap (-server -Xms2000M -Xmx2000M -XX:+AggressiveHeap
-XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled
-XX:+CMSPermGenSweepingEnabled) and Java 1.5 (upgrade scheduled for Spring)
** The bottom line is that with all of the above, I have to filter out less
popular items and less active users in order to be able to return
recommendations in a reasonable amount of time (e.g. 100-200 ms at the 150-300
reqs/min rate). In the end, after this filtering, I end up with, say, 30K
users and 50K items, and that's what I use to build the DataModel. If I remove
filtering and let more data in, the performance goes down the drain.
My feeling is 30K users and 50K items makes for an awfully small data set and
that Taste, esp. at only
150-300 reqs/min on an 8-core server should be much faster. I have a feeling
I'm doing something wrong and that Taste is really capable of handling more
data, faster. Here is the code I use to construct the recommender:
idMigrator = LocalMemoryIDMigrator.getInstance();
model = MyDataModel.getInstance("itemType");
// ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
similarity = new TanimotoCoefficientSimilarity(model);
similarity = new CachingUserSimilarity(similarity, model);
// hood size is 50, minSimilarity is 0.1, samplingRate is 1.0
hood = new NearestNUserNeighborhood(hoodSize, minSimilarity,similarity,
model, samplingRate);
recommender = new GenericUserBasedRecommender(model, hood, similarity);
recommender = new CachingRecommender(recommender);
What do you think of the above numbers?
Thanks,
Otis