To me you're just describing user-based recommendation. You find a neighborhood of similar users, then examine their items, and recommend from those by taking a weighted average of the neighborhood's preferences.
Your Lucene-based construction then sounds like item-based recommendation. Find items similar to what the user prefers and recommend based on a weighted average, again. Do I have that right? And then, do you need a Hadoop-based implementation using SequenceFiles? What kind of data size are you looking at? On Wed, Jun 23, 2010 at 12:49 AM, Jay Sellers <[email protected]> wrote: > Thanks Vivek, > We do not have predefined clusters/groups. We expect the groups to mutate as > more history (data) is accumulated. A simple use case is as follows: > John has viewed a pair of jeans, a cowboy hat, a red shirt and a pair of > boots. > Scott has viewed a pair of jeans, a cowboy hat, a red shirt and a pocket > watch. > Larry has viewed a pair of jeans, a cowboy hat and a red shirt. > > When we send Larry and his items into our reco engine, we would expect a > pair of boots and a pocket watch to be recommended. We'd expect this > because we've determined that John and Scott are 'like' Larry and thus are > in the same cluster. > > Again, we fully expect the cluster members to change, as user/item data > accumulates. > > On Tue, Jun 22, 2010 at 4:37 PM, Vivek Khanna <[email protected]>wrote: > >> >> Hi, >> >> >> >> For your clustering/grouping, what is your expectation? Do you have >> pre-defined clusters/groups that you want to cluster the items within those, >> or do you envision a system where clusters/groups will change and evolve as >> the data changes? >> >> >> >> In each case, it seems you are looking for unsupervised approaches. Is that >> correct? >> >> >> >> I am new to this email list, so pardon my ignorance, but from what work I >> have done in the past with IR, ML (clustering, More like this, >> categorization, topic detection etc.), my advice to you is to identify your >> requirements, use cases and page flow interactions as the first step. :) >> >> >> >> Hope this helps! >> >> Vivek. >> >> > Date: Tue, 22 Jun 2010 15:50:18 -0700 >> > Subject: User/Items Reco Engine clustering >> > From: [email protected] >> > To: [email protected] >> > >> > I'm looking to enhance a product recommendation engine. It currently >> works >> > with all data as a whole. I want to introduce clustering/grouping. Its >> > model based and the relationship is the common User-Items relationship. >> > Originally I was thinking of using a Canopy / kmeans cluster. And then >> > determine which cluster a user is in and execute Item Similarity against >> > only that cluster of items. However I can't figure out how to build a >> > SequenceFile using vectors with the User/Items relationship. I don't know >> > which data points to feed the vector. So I scratched that idea and turned >> > my attention to Lucene, thinking that this is really a document issue. >> Where >> > users are documents and the items are the content. I should be able to >> ask >> > Lucene, give me documents that look like this "productId3 productId9056 >> > productId234". >> > >> > I'm looking for any and all feedback from those experienced in the >> > recommendation world, specifically with the grouping of users and items. >> > >> > Thanks, >> > -Jay >> >> _________________________________________________________________ >> The New Busy is not the old busy. Search, chat and e-mail from your inbox. >> >> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3 >
