I can point you to 90% of what you need in the existing code. Look at
package org.apache.mahout.cf.taste.hadoop.item first.

RecommenderJob runs several MRs to make recommendations, and along the
way does what you want -- almost. It outputs user vectors -- for each
user, a vector with item IDs as indices and pref values as
coordinates. You want the transpose of that -- for each item, a vector
with user IDs as indices, etc.

We can't use IDs in the recommender as indices directly, since IDs are
longs, and vector dimensions are ints of course. So there's the first
stage where we create a mapping from the real IDs to hashed indices.
This is what ItemIDIndexMapper/Reducer do. You would just copy and
tweak them to deal with user IDs.

Then ToItemPrefsMapper/ToUserVectorReducer team up to write out the
vectors. Same thing -- just an exercise in swapping user IDs and item
IDs.

The rest of the MRs don't matter to you. You could even copy
RecommenderJob and cut out the other bits it runs, and have a
ready-made driver.

It's easier than it maybe sounds -- these are all quite small classes.


If it works and you care to think through and contribute a clean
refactoring that allows for generating item vectors as well as user
vectors I'd commit that. But feel free to just hack for your own
purpose too.


Sean



On Sat, Feb 6, 2010 at 4:07 PM, Matthew Bryan <[email protected]> wrote:
> Is there a straightforward way to take a preference file that's used
> for a recommender (user_id, item_id, preference) and turn it into a
> vector that can be used for clustering? As part of my evaluation of
> Mahout I'd also like to cluster items and see how those simple
> clusters perform.
>
> Thanks!
>
> Matthew Bryan
>

Reply via email to