Hi Sean,

Thanks for your input.

Let me see if I understand correctly what you've described.

Given a data set with two users with their associated browsing and order 
history:

User 1
browsed: P1, P2, P3, P4
purchased: P2, P3

User 2
browsed: P2, P4, P6
purchased: P2, P4

In order to support the generation of recommendations I am now computing the 
item pairs and the pair count (minsup=0):

<P1, P2> 1 
<P1, P3> 1

<P2, P2> 2
<P2, P3> 1
<P2, P4> 1

<P3, P2> 1
<P3, P3> 1
 
<P4, P2> 2
<P4, P3> 1
<P4, P4> 1

<P6, P2> 1
<P6, P4> 1

This lets me provide a simplistic recommendation:
Browsed P2 -> Purchased : P2 2 times/50%, P3 1 time/25%, P4 1 time/25%

If I want to use Mahout to arrive to a similar result I need to process my two 
sources - containing browse and purchase events into one that looks like the 
pair table describe above, including the pair count. Then, using this item 
pairs table as an input I could use any recommender and similarity to get to 
the expected result. Is my understanding correct? One question that I have 
regarding this approach - how can I tell the algorithm that I have already 
precomputed the counts? I assume this is something the algorithm will do itself 
and needs to know if that has been done externally. 

Given the size of the pair table (for >1M users, 500K items and >1Bln 
transactions I'm estimating the purchase pairs could potentially get in tens of 
billions of data points) this step seems to be by far the most resource 
intensive. Would there be any other way that doesn't require this step to be 
executed prior to running a Mahout recommender? It seems unlikely from what I 
hear - but hope there's a solution that doesn't involve generating that much 
data in the database.

I've just started looking into Mahout - so my questions might not be too 
concise at this point :)
Hopefully that will change as I understand more about the concepts behind it. 
Great book btw.

Thanks.
Sebastian

On Apr 15, 2010, at 8:23 AM, Sean Owen wrote:

The framework is pretty general, so yeah you can get it to do most
anything, though some things might need more custom code than others.

Viewed generally, a recommender takes as input associations from As to
Bs, and then given an A, predicts new associations to Bs. Usually we
think of As as users and Bs and items. But you could let As be browsed
items, and Bs be items that were ultimately purchased by users who
browsed A.

Then this is a recommender problem, not merely a simpler
most-similar-items problem. Given an item being browsed, you can
recommend items that are most likely to be purchased.

The work you'd have to do is simply assembling these associations in
the first place. You'd dig through your purchase and browsing data,
and output all item-item pairs where item 1 is a browsed item and item
2 is an item that was ultimately purchased by one or more users who
browsed the first item. The value might be the number of users who fit
this description.

Once you have that input you can throw any of the recommenders at it
to produce the output. You'd have more choice, including distributed
recommenders, and have access to evaluators as well. No custom code
ought to be needed unless you want to.


On Thu, Apr 15, 2010 at 1:10 PM, Sebastian Feher <[email protected]> wrote:
There are a few questions that I'm not able to answer:
- do you support cross-type frequent item sets? for example - people who 
Browsed this item - ended up purchasing these items. In this case the item 
pairs are generated by taking one item from the Browse space and the other from 
Purchase space. Is this something that can be achieved with the current 
algorithms(GenericItemBasedRecommender.mostSimilarItems(), FP-Growth) in there 
existing form and if not there an extension mechanism that allows me to do that 
in a clean fashion or do I have to modify the algorithm code?


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to