Quick report that, sadly, TreeClusteringRecommender (TreeClusteringRecommender2 
actually!) is a no go, too.  It's been running for well over an hour over the 
same amount of data, and this is where its been spending its time:

"qtp0-0" prio=10 tid=0x6bada400 nid=0x7551 runnable [0x6bfec000..0x6bfed140]
   java.lang.Thread.State: RUNNABLE
    at 
org.apache.mahout.cf.taste.impl.recommender.TreeClusteringRecommender2.findClosestClusters(TreeClusteringRecommender2.java:428)
    at 
org.apache.mahout.cf.taste.impl.recommender.TreeClusteringRecommender2.mergeClosestClusters(TreeClusteringRecommender2.java:331)
    at 
org.apache.mahout.cf.taste.impl.recommender.TreeClusteringRecommender2.buildClusters(TreeClusteringRecommender2.java:313)
    at 
org.apache.mahout.cf.taste.impl.recommender.TreeClusteringRecommender2.checkClustersBuilt(TreeClusteringRecommender2.java:230)
    at 
org.apache.mahout.cf.taste.impl.recommender.TreeClusteringRecommender2.recommend(TreeClusteringRecommender2.java:159)
    at 
org.apache.mahout.cf.taste.impl.recommender.CachingRecommender.recommend(CachingRecommender.java:110)



This is how I'm using it.

        recommender = new TreeClusteringRecommender2(model,
                new NearestNeighborClusterSimilarity(new 
TanimotoCoefficientSimilarity(model)), 0.5);
        recommender = new CachingRecommender(recommender);

Not sure, at this point, whether a number closer to 0.0 or 1.0 yields faster 
computation (but suboptimal clustering).
So, I'm guessing that TreeClusteringRecommender2 may also not be an option when 
working with a non-trivial dataset:

$ # number of distinct users
$ cut -d, -f1 input.txt | sort | uniq | wc -l
899308

$ # number of distinct items
$ cut -d, -f2 input.txt | sort | uniq | wc -l
60302

Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Otis Gospodnetic <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Friday, October 31, 2008 1:12:04 PM
> Subject: Taste: real-time no go - distributed pre-computing?
> 
> Hello,
> 
> I'm using TanimotoCoefficientSimilarity.  With or without Rescorer, virtually 
> all time gets spent in TanimotoCoefficientSimilarity.itemCorrelation (see 
> below).
> I have not profiled things yet, but looking at
> TanimotoCoefficientSimilarity.itemCorrelation I don't see much room for
> performance improvement.
> 
> So how can this puppy scale?  From what I can tell so far, the only way
> to scale is to really pre-compute recommendations for all users ahead
> of time and simply store them somewhere (e.g. DB, FS, memcached) for a quick
> user->recommendations lookup.  It looks like real-time computation
> is out of question.  Since CF/Taste sort of requires access to all
> users' data in order to compute recommendations, I don't yet see how
> data could be broken into smaller chunks and processed
> in distributed MapReduce-style... or does anyone see how this could be done? 
> [1]
> 
> I looked at Ian's emails again and see that he, too, says there is no 
> real-time 
> aspect in their system, plus it looks like they do aggregation and store 
> aggregation summaries for quick lookup in a DB, but don't really use Taste 
> for 
> recommending items to individual users.
> 
> [1]
> But this really brings me back a thread from the end of August thread, whose 
> key 
> messages are:
> 
> http://markmail.org/message/jo66sxyyn2pklsgv
> http://markmail.org/message/cfntfbhshn5qz36n
> http://markmail.org/message/27ijhgs4ghpr6cjv
> http://markmail.org/message/eu3npmt7ggzc2jaq
> 
> It sounds like the next step to try are TreeClusteringRecommender and 
> TreeClusteringRecommender2...
> 
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> "qtp0-0" prio=10 tid=0x08af3000 nid=0x5b94 runnable [0x6c0c6000..0x6c0c6fc0]
>    java.lang.Thread.State: RUNNABLE
>     at
> org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity.itemCorrelation(TanimotoCoefficientSimilarity.java:161)
>     at
> org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender.doEstimatePreference(GenericItemBasedRecommender.java:206)
>     at 
> org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender.access$400(GenericItemBasedRecommender.java:59)
>     at
> org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender$Estimator.estimate(GenericItemBasedRecommender.java:265)
>     at
> org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender$Estimator.estimate(GenericItemBasedRecommender.java:256)
>     at 
> org.apache.mahout.cf.taste.impl.recommender.TopItems.getTopItems(TopItems.java:54)
>     at 
> org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender.recommend(GenericItemBasedRecommender.java:101)
>     at 
> org.apache.mahout.cf.taste.impl.recommender.AbstractRecommender.recommend(AbstractRecommender.java:52)
>     at 
> org.apache.mahout.cf.taste.impl.recommender.CachingRecommender$RecommendationRetriever.get(CachingRecommender.java:170)
>     at 
> org.apache.mahout.cf.taste.impl.recommender.CachingRecommender$RecommendationRetriever.get(CachingRecommender.java:158)
>     at 
> org.apache.mahout.cf.taste.impl.common.Cache.getAndCacheValue(Cache.java:102)
>     at org.apache.mahout.cf.taste.impl.common.Cache.get(Cache.java:76)
>     at 
> org.apache.mahout.cf.taste.impl.recommender.CachingRecommender.recommend(CachingRecommender.java:93)

Reply via email to