I did, some 6+ months ago (pre all-IDs-are-longs changes).  I remember seeing 
the most time spent in TanimotoCoefficientSimilarity and thinking "damn, this 
is all just set intersection and basic math operations - how do I speed that 
up?".

Otis



----- Original Message ----
> From: Grant Ingersoll <[email protected]>
> To: [email protected]
> Sent: Tue, November 24, 2009 3:25:53 PM
> Subject: Re: Taste speed
> 
> Have you done any profiling?  It would be interesting to know where the 
> bottlenecks are on your dataset.
> 
> -Grant
> 
> On Nov 24, 2009, at 2:37 PM, Otis Gospodnetic wrote:
> 
> > Correction for the number of user and item data:
> > Users: 25K
> > Items: 2K
> > 
> > I am less worried about increasing the number of potential items to 
> > recommend.
> > I am more interested in getting more users into Taste, so the larger 
> percentage of my users can get recommendations.
> > For example, to filter out users I require certain level of activity in 
> > terms 
> of the number of items previously consumed.
> > With that threshold at 15, I get about 25K users (the above) -- so 25K 
> > users 
> consumed 15 or more items
> > With 10, I get about 50K users who consumed 10 or more items.
> > With 5, I get about 200K users who consumed 5 or more items (presumably 
> > just 5 
> items would produce good-enough recommendations)
> > 
> > I know I could lower the sampling rate and get more users in, but that 
> > feels 
> like cheating and will lower the quality of recommendations.  I have a 
> feeling 
> even with the sampling rate of 1.0 I should be able to get more users into 
> Taste 
> and still have Taste give me recommendations in 100-200ms with only 150-300 
> reqs/minute.
> > 
> > 
> > Otis
> > 
> > 
> > 
> > ----- Original Message ----
> >> From: Otis Gospodnetic 
> >> To: [email protected]
> >> Sent: Tue, November 24, 2009 2:10:07 PM
> >> Subject: Taste speed
> >> 
> >> Hello,
> >> 
> >> I've been using Taste for a while, but it's not scaling well, and I 
> >> suspect 
> I'm 
> >> doing something wrong.
> >> When I say "not scaling well", this is what I mean:
> >> * I have 1 week's worth of data (user,item datapoints)
> >> * I don't have item preferences, so I'm using the boolean model
> >> * I have caching in front of Taste, so the rate of requests that Taste 
> >> needs 
> to 
> >> handle is only 150-300 reqs/minute/server
> >> * The server is an 8-core 2.5GHz 32-bit machine with 32 GB of RAM
> >> * I use 2GB heap (-server -Xms2000M -Xmx2000M -XX:+AggressiveHeap 
> >> -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled 
> >> -XX:+CMSPermGenSweepingEnabled) and Java 1.5 (upgrade scheduled for Spring)
> >> 
> >> ** The bottom line is that with all of the above, I have to filter out 
> >> less 
> >> popular items and less active users in order to be able to return 
> >> recommendations in a reasonable amount of time (e.g. 100-200 ms at the 
> 150-300 
> >> reqs/min rate).  In the end, after this filtering, I end up with, say, 30K 
> users 
> >> and 50K items, and that's what I use to build the DataModel.  If I remove 
> >> filtering and let more data in, the performance goes down the drain.
> >> 
> >> My feeling is 30K users and 50K items makes for an awfully small data set 
> >> and 
> 
> >> that Taste, esp. at only
> >> 150-300 reqs/min on an 8-core server should be much faster.  I have a 
> >> feeling 
> 
> >> I'm doing something wrong and that Taste is really capable of handling 
> >> more 
> >> data, faster.  Here is the code I use to construct the recommender:
> >> 
> >>    idMigrator = LocalMemoryIDMigrator.getInstance();
> >>    model = MyDataModel.getInstance("itemType");
> >> 
> >>    // ItemSimilarity similarity = new LogLikelihoodSimilarity(model);
> >>    similarity = new TanimotoCoefficientSimilarity(model);
> >>    similarity = new CachingUserSimilarity(similarity, model);
> >> 
> >>    // hood size is 50, minSimilarity is 0.1, samplingRate is 1.0
> >>    hood = new NearestNUserNeighborhood(hoodSize, minSimilarity,similarity, 
> >> model, samplingRate);
> >> 
> >>    recommender = new GenericUserBasedRecommender(model, hood, similarity);
> >>    recommender = new CachingRecommender(recommender);
> >> 
> >> What do you think of the above numbers?
> >> 
> >> Thanks,
> >> Otis
> > 
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
> Solr/Lucene:
> http://www.lucidimagination.com/search

Reply via email to