Hi Sean, Thanks for your input.
Let me see if I understand correctly what you've described. Given a data set with two users with their associated browsing and order history: User 1 browsed: P1, P2, P3, P4 purchased: P2, P3 User 2 browsed: P2, P4, P6 purchased: P2, P4 In order to support the generation of recommendations I am now computing the item pairs and the pair count (minsup=0): <P1, P2> 1 <P1, P3> 1 <P2, P2> 2 <P2, P3> 1 <P2, P4> 1 <P3, P2> 1 <P3, P3> 1 <P4, P2> 2 <P4, P3> 1 <P4, P4> 1 <P6, P2> 1 <P6, P4> 1 This lets me provide a simplistic recommendation: Browsed P2 -> Purchased : P2 2 times/50%, P3 1 time/25%, P4 1 time/25% If I want to use Mahout to arrive to a similar result I need to process my two sources - containing browse and purchase events into one that looks like the pair table describe above, including the pair count. Then, using this item pairs table as an input I could use any recommender and similarity to get to the expected result. Is my understanding correct? One question that I have regarding this approach - how can I tell the algorithm that I have already precomputed the counts? I assume this is something the algorithm will do itself and needs to know if that has been done externally. Given the size of the pair table (for >1M users, 500K items and >1Bln transactions I'm estimating the purchase pairs could potentially get in tens of billions of data points) this step seems to be by far the most resource intensive. Would there be any other way that doesn't require this step to be executed prior to running a Mahout recommender? It seems unlikely from what I hear - but hope there's a solution that doesn't involve generating that much data in the database. I've just started looking into Mahout - so my questions might not be too concise at this point :) Hopefully that will change as I understand more about the concepts behind it. Great book btw. Thanks. Sebastian On Apr 15, 2010, at 8:23 AM, Sean Owen wrote: The framework is pretty general, so yeah you can get it to do most anything, though some things might need more custom code than others. Viewed generally, a recommender takes as input associations from As to Bs, and then given an A, predicts new associations to Bs. Usually we think of As as users and Bs and items. But you could let As be browsed items, and Bs be items that were ultimately purchased by users who browsed A. Then this is a recommender problem, not merely a simpler most-similar-items problem. Given an item being browsed, you can recommend items that are most likely to be purchased. The work you'd have to do is simply assembling these associations in the first place. You'd dig through your purchase and browsing data, and output all item-item pairs where item 1 is a browsed item and item 2 is an item that was ultimately purchased by one or more users who browsed the first item. The value might be the number of users who fit this description. Once you have that input you can throw any of the recommenders at it to produce the output. You'd have more choice, including distributed recommenders, and have access to evaluators as well. No custom code ought to be needed unless you want to. On Thu, Apr 15, 2010 at 1:10 PM, Sebastian Feher <[email protected]> wrote: There are a few questions that I'm not able to answer: - do you support cross-type frequent item sets? for example - people who Browsed this item - ended up purchasing these items. In this case the item pairs are generated by taking one item from the Browse space and the other from Purchase space. Is this something that can be achieved with the current algorithms(GenericItemBasedRecommender.mostSimilarItems(), FP-Growth) in there existing form and if not there an extension mechanism that allows me to do that in a clean fashion or do I have to modify the algorithm code? __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
