Yeah it's almost over unfortunately. :) I tried this a while ago with a slope-one recommender, and was only about able to match Netflix's current performance. I published some support code for people who wanted to play with it but removed it from Mahout's copy as legacy code.
I didn't really have time to investigate more. Some of the insights that have fallen out from the competition are pretty great. For example: one person took advantage of a sort of "memory effect" for recommendations.... people tend to at times over-rate movies and at times under-rate movies. So if you kind of correct for this -- that a sequence of 5-star ratings may not be as meaningful as a 5-star rating in the middle of several 2-star ratings, you get much better performance. This nugget of knowledge may be specific to Netflix, not sure. But it was interesting. On Wed, Sep 3, 2008 at 9:28 AM, deneche abdelhakim <[EMAIL PROTECTED]> wrote: > I came across the following competition > > http://www.netflixprize.com/index > > > It's about recommender systems, so I think it's a Taste stuff. The training > dataset consists of more than 100M ratings. > > > ----- Message d'origine ---- > De : Josh Myer <[EMAIL PROTECTED]> > À : mahout-dev@lucene.apache.org > Envoyé le : Mercredi, 30 Juillet 2008, 18h19mn 25s > Objet : Re: FYI Cloud Computing Resources > > On Wed, Jul 30, 2008 at 11:26:29AM -0400, Grant Ingersoll wrote: >> http://research.yahoo.com/node/2328 >> >> It _MAY_ (stressed, emphasized, etc.) be possible for Mahouters (or >> are we just Mahouts?) to get some access to these resources. One big >> question is where can we get some fairly large data sets (large, but >> not super large, I think, but am not sure) >> >> If you have ideas, etc. please let us know. >> > > It's worth plugging (theinfo), http://theinfo.org/. It's a project to > collect references to datasets, and may help here. Unfortunately, it > seems to be laggy at the moment. I'll poke Aaron about that =) > > HtH, > -- > Josh Myer > [EMAIL PROTECTED] > > > > >