On Sat, Dec 12, 2009 at 11:08 PM, Jake Mannix <[email protected]> wrote: > You're not computing only one recommendation at a time, are you? > I really need to read through the hadoop.item code, but in general, what > is the procedure here? If you're doing work on HDFS as a M/R job, you're > doing a huge batch, right? You're saying the aggregate performance is > 10 seconds per recomendation across millions of recommendations, or > doing a one-shot task?
Recommendations are computed for one user at a time, by multiplying the co-occurrence matrix by the user preference vector. And then yes it's one big job invoking computation for all users. I'm running this all one one machine (my laptop) so it's kind of serialized anyway. yes it was 10 seconds to compute all recs for one user; it's a couple secs now with some more work. That's still rough but not awful. > offline). Can you give a quick review of which part of this is supposed > to be on Hadoop, which parts are done live, a kind of big picture > description of what's going on? All of it is on Hadoop here. It's pretty simple -- make the user vectors, make the co-occurrence matrix (all that is quite fast), then multiply the two to make recommendations.
