Another way to look at the problem is to consider user purchases/actions as features describing a user in a vector space. Then the problem is reduced to finding users similar to each other based on this feature set.
Clustering would be overly complex in my humble opinion. I agree with Sean that the Lucene based construction as you describe it Jay, is item-based and not user-based. Hope this helps. > Date: Wed, 23 Jun 2010 08:59:03 +0100 > Subject: Re: User/Items Reco Engine clustering > From: [email protected] > To: [email protected] > > To me you're just describing user-based recommendation. You find a > neighborhood of similar users, then examine their items, and recommend > from those by taking a weighted average of the neighborhood's > preferences. > > Your Lucene-based construction then sounds like item-based > recommendation. Find items similar to what the user prefers and > recommend based on a weighted average, again. > > Do I have that right? > > And then, do you need a Hadoop-based implementation using SequenceFiles? > What kind of data size are you looking at? > > On Wed, Jun 23, 2010 at 12:49 AM, Jay Sellers <[email protected]> wrote: > > Thanks Vivek, > > We do not have predefined clusters/groups. We expect the groups to mutate as > > more history (data) is accumulated. A simple use case is as follows: > > John has viewed a pair of jeans, a cowboy hat, a red shirt and a pair of > > boots. > > Scott has viewed a pair of jeans, a cowboy hat, a red shirt and a pocket > > watch. > > Larry has viewed a pair of jeans, a cowboy hat and a red shirt. > > > > When we send Larry and his items into our reco engine, we would expect a > > pair of boots and a pocket watch to be recommended. We'd expect this > > because we've determined that John and Scott are 'like' Larry and thus are > > in the same cluster. > > > > Again, we fully expect the cluster members to change, as user/item data > > accumulates. > > > > On Tue, Jun 22, 2010 at 4:37 PM, Vivek Khanna > > <[email protected]>wrote: > > > >> > >> Hi, > >> > >> > >> > >> For your clustering/grouping, what is your expectation? Do you have > >> pre-defined clusters/groups that you want to cluster the items within > >> those, > >> or do you envision a system where clusters/groups will change and evolve as > >> the data changes? > >> > >> > >> > >> In each case, it seems you are looking for unsupervised approaches. Is that > >> correct? > >> > >> > >> > >> I am new to this email list, so pardon my ignorance, but from what work I > >> have done in the past with IR, ML (clustering, More like this, > >> categorization, topic detection etc.), my advice to you is to identify your > >> requirements, use cases and page flow interactions as the first step. :) > >> > >> > >> > >> Hope this helps! > >> > >> Vivek. > >> > >> > Date: Tue, 22 Jun 2010 15:50:18 -0700 > >> > Subject: User/Items Reco Engine clustering > >> > From: [email protected] > >> > To: [email protected] > >> > > >> > I'm looking to enhance a product recommendation engine. It currently > >> works > >> > with all data as a whole. I want to introduce clustering/grouping. Its > >> > model based and the relationship is the common User-Items relationship. > >> > Originally I was thinking of using a Canopy / kmeans cluster. And then > >> > determine which cluster a user is in and execute Item Similarity against > >> > only that cluster of items. However I can't figure out how to build a > >> > SequenceFile using vectors with the User/Items relationship. I don't know > >> > which data points to feed the vector. So I scratched that idea and turned > >> > my attention to Lucene, thinking that this is really a document issue. > >> Where > >> > users are documents and the items are the content. I should be able to > >> ask > >> > Lucene, give me documents that look like this "productId3 productId9056 > >> > productId234". > >> > > >> > I'm looking for any and all feedback from those experienced in the > >> > recommendation world, specifically with the grouping of users and items. > >> > > >> > Thanks, > >> > -Jay > >> > >> _________________________________________________________________ > >> The New Busy is not the old busy. Search, chat and e-mail from your inbox. > >> > >> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3 > > _________________________________________________________________ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
