I can point you to 90% of what you need in the existing code. Look at package org.apache.mahout.cf.taste.hadoop.item first.
RecommenderJob runs several MRs to make recommendations, and along the way does what you want -- almost. It outputs user vectors -- for each user, a vector with item IDs as indices and pref values as coordinates. You want the transpose of that -- for each item, a vector with user IDs as indices, etc. We can't use IDs in the recommender as indices directly, since IDs are longs, and vector dimensions are ints of course. So there's the first stage where we create a mapping from the real IDs to hashed indices. This is what ItemIDIndexMapper/Reducer do. You would just copy and tweak them to deal with user IDs. Then ToItemPrefsMapper/ToUserVectorReducer team up to write out the vectors. Same thing -- just an exercise in swapping user IDs and item IDs. The rest of the MRs don't matter to you. You could even copy RecommenderJob and cut out the other bits it runs, and have a ready-made driver. It's easier than it maybe sounds -- these are all quite small classes. If it works and you care to think through and contribute a clean refactoring that allows for generating item vectors as well as user vectors I'd commit that. But feel free to just hack for your own purpose too. Sean On Sat, Feb 6, 2010 at 4:07 PM, Matthew Bryan <[email protected]> wrote: > Is there a straightforward way to take a preference file that's used > for a recommender (user_id, item_id, preference) and turn it into a > vector that can be used for clustering? As part of my evaluation of > Mahout I'd also like to cluster items and see how those simple > clusters perform. > > Thanks! > > Matthew Bryan >
