To me you're just describing user-based recommendation. You find a
neighborhood of similar users, then examine their items, and recommend
from those by taking a weighted average of the neighborhood's
preferences.

Your Lucene-based construction then sounds like item-based
recommendation. Find items similar to what the user prefers and
recommend based on a weighted average, again.

Do I have that right?

And then, do you need a Hadoop-based implementation using SequenceFiles?
What kind of data size are you looking at?

On Wed, Jun 23, 2010 at 12:49 AM, Jay Sellers <[email protected]> wrote:
> Thanks Vivek,
> We do not have predefined clusters/groups. We expect the groups to mutate as
> more history (data) is accumulated.  A simple use case is as follows:
> John has viewed a pair of jeans, a cowboy hat, a red shirt and a pair of
> boots.
> Scott has viewed a pair of jeans, a cowboy hat, a red shirt and a pocket
> watch.
> Larry has viewed a pair of jeans, a cowboy hat and a red shirt.
>
> When we send Larry and his items into our reco engine, we would expect a
> pair of boots and a pocket watch to be recommended.  We'd expect this
> because we've determined that John and Scott are 'like' Larry and thus are
> in the same cluster.
>
> Again, we fully expect the cluster members to change, as user/item data
> accumulates.
>
> On Tue, Jun 22, 2010 at 4:37 PM, Vivek Khanna <[email protected]>wrote:
>
>>
>> Hi,
>>
>>
>>
>> For your clustering/grouping, what is your expectation? Do you have
>> pre-defined clusters/groups that you want to cluster the items within those,
>> or do you envision a system where clusters/groups will change and evolve as
>> the data changes?
>>
>>
>>
>> In each case, it seems you are looking for unsupervised approaches. Is that
>> correct?
>>
>>
>>
>> I am new to this email list, so pardon my ignorance, but from what work I
>> have done in the past with IR, ML (clustering, More like this,
>> categorization, topic detection etc.), my advice to you is to identify your
>> requirements, use cases and page flow interactions as the first step. :)
>>
>>
>>
>> Hope this helps!
>>
>> Vivek.
>>
>> > Date: Tue, 22 Jun 2010 15:50:18 -0700
>> > Subject: User/Items Reco Engine clustering
>> > From: [email protected]
>> > To: [email protected]
>> >
>> > I'm looking to enhance a product recommendation engine. It currently
>> works
>> > with all data as a whole. I want to introduce clustering/grouping. Its
>> > model based and the relationship is the common User-Items relationship.
>> > Originally I was thinking of using a Canopy / kmeans cluster. And then
>> > determine which cluster a user is in and execute Item Similarity against
>> > only that cluster of items. However I can't figure out how to build a
>> > SequenceFile using vectors with the User/Items relationship. I don't know
>> > which data points to feed the vector. So I scratched that idea and turned
>> > my attention to Lucene, thinking that this is really a document issue.
>> Where
>> > users are documents and the items are the content. I should be able to
>> ask
>> > Lucene, give me documents that look like this "productId3 productId9056
>> > productId234".
>> >
>> > I'm looking for any and all feedback from those experienced in the
>> > recommendation world, specifically with the grouping of users and items.
>> >
>> > Thanks,
>> > -Jay
>>
>> _________________________________________________________________
>> The New Busy is not the old busy. Search, chat and e-mail from your inbox.
>>
>> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
>

Reply via email to