Hello, I'm using TanimotoCoefficientSimilarity. With or without Rescorer, virtually all time gets spent in TanimotoCoefficientSimilarity.itemCorrelation (see below). I have not profiled things yet, but looking at TanimotoCoefficientSimilarity.itemCorrelation I don't see much room for performance improvement.
So how can this puppy scale? From what I can tell so far, the only way to scale is to really pre-compute recommendations for all users ahead of time and simply store them somewhere (e.g. DB, FS, memcached) for a quick user->recommendations lookup. It looks like real-time computation is out of question. Since CF/Taste sort of requires access to all users' data in order to compute recommendations, I don't yet see how data could be broken into smaller chunks and processed in distributed MapReduce-style... or does anyone see how this could be done? [1] I looked at Ian's emails again and see that he, too, says there is no real-time aspect in their system, plus it looks like they do aggregation and store aggregation summaries for quick lookup in a DB, but don't really use Taste for recommending items to individual users. [1] But this really brings me back a thread from the end of August thread, whose key messages are: http://markmail.org/message/jo66sxyyn2pklsgv http://markmail.org/message/cfntfbhshn5qz36n http://markmail.org/message/27ijhgs4ghpr6cjv http://markmail.org/message/eu3npmt7ggzc2jaq It sounds like the next step to try are TreeClusteringRecommender and TreeClusteringRecommender2... Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch "qtp0-0" prio=10 tid=0x08af3000 nid=0x5b94 runnable [0x6c0c6000..0x6c0c6fc0] java.lang.Thread.State: RUNNABLE at org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity.itemCorrelation(TanimotoCoefficientSimilarity.java:161) at org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender.doEstimatePreference(GenericItemBasedRecommender.java:206) at org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender.access$400(GenericItemBasedRecommender.java:59) at org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender$Estimator.estimate(GenericItemBasedRecommender.java:265) at org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender$Estimator.estimate(GenericItemBasedRecommender.java:256) at org.apache.mahout.cf.taste.impl.recommender.TopItems.getTopItems(TopItems.java:54) at org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender.recommend(GenericItemBasedRecommender.java:101) at org.apache.mahout.cf.taste.impl.recommender.AbstractRecommender.recommend(AbstractRecommender.java:52) at org.apache.mahout.cf.taste.impl.recommender.CachingRecommender$RecommendationRetriever.get(CachingRecommender.java:170) at org.apache.mahout.cf.taste.impl.recommender.CachingRecommender$RecommendationRetriever.get(CachingRecommender.java:158) at org.apache.mahout.cf.taste.impl.common.Cache.getAndCacheValue(Cache.java:102) at org.apache.mahout.cf.taste.impl.common.Cache.get(Cache.java:76) at org.apache.mahout.cf.taste.impl.recommender.CachingRecommender.recommend(CachingRecommender.java:93)
