Thanks Owen.

My next question is this step from the tutorial:

Edit recommender.properties and fill in the recommender.class:
recommender.class=org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommender


It seems this is already in the file, or only needs to be uncommented.  My
question is
what is "GroupLensRecommender". I googled on it but didn't find any
reference except the tutorial?

Brian



On Mon, Oct 19, 2009 at 8:47 AM, Sean Owen <[email protected]> wrote:

> You got the "100K" data set which is quite different for some reason.
> Make sure you nab the 1M data set and the instructions will make
> sense.
>
> The target directory should exist in the tarball, since it exists in
> SVN, but oops maybe it doesn't for some reason. In any event you can
> just create it.
>
> Yes the underlying FileDataModel is pretty flexible. The javadoc
> should cover it pretty well -- tab or comma separated, needs the first
> three fields to be user ID, item ID, pref value (if applicable).
>
>  It will read the 'u.data' file just fine. However, the example code
> this tutorial references is using a custom implementation, since the
> 1M and 10M data set files are using a strange format that needs
> something customized. You could easily dig in to the code and swap in
> FileDataModel for GroupLensDataModel if you want to use the 100K data
> set.
>
> The other data is pretty domain-specific and is not directly relevant
> to a recommender engine. So no there is nothing that would do anything
> with 'u.item' for instance. However it would be pretty easy to write,
> for example, a custom ItemSimilarity implementation that reads this
> and deduces some notion of similarity from genre. You could then plug
> that in to a GenericItemBasedRecommender for a fast, and perhaps quite
> effective, recommender.
>
> Ah perhaps this will be an example in the book ... :)
>
> Sean
>
>
> On Mon, Oct 19, 2009 at 4:27 PM, Brian Wolf <[email protected]> wrote:
> > Hi,
> > I discovered and downloaded mahout today. Maybe its just giddiness, but
> can
> > you help me,
> >
> >
> > this from tutorial http://lucene.apache.org/mahout/taste.html
> > "
> >
> >   1. Download the "1 Million MovieLens Dataset" from
> >   http://www.grouplens.org/.
> >   2. Unpack the archive and copy   ->movies.dat<-   and
>  ->ratings.dat<-
> >     to
> >
> trunk/taste-web/src/main/resources/org/apache/mahout/cf/taste/example/grouplens
> > under
> >   the Mahout distribution directory.
> >
> > "
> >
> >  I
> >
> >
> > I downloaded the  MovieLens date set, there is no "movies.dat or
> > ratings.dat". Are the correct files u.data and u.item?
> > I haven't found any documention  on file formats, there are other things
> > confusing to new users, such as when I built
> > the downloaded gz file, and built it with maven following the
> instructions ,
> >  the directory  was only partly built, however, when I used checked out
> with
> > svn, the full diretory structure was built.
> >
> > Can Taste incorporate other data files, like the ones listed below, as
> > well?, ie demographic data, etc Where can I find documentation about data
> > file formats accepted by taste, or do I need to dig into the code?
> >
> >
> > Thank you,
> > Brian Wolf
> > developer
> > gOgO deVelopment, ltd
> > Sedona, AZ
> >
> > u.data     -- The full u data set, 100000 ratings by 943 users on 1682
> > items.
> >              Each user has rated at least 20 movies.  Users and items are
> >              numbered consecutively from 1.  The data is randomly
> >              ordered. This is a tab separated list of
> >         user id | item id | rating | timestamp.
> >              The time stamps are unix seconds since 1/1/1970 UTC
> >
> > u.info     -- The number of users, items, and ratings in the u data set.
> >
> > u.item     -- Information about the items (movies); this is a tab
> separated
> >              list of
> >              movie id | movie title | release date | video release date |
> >              IMDb URL | unknown | Action | Adventure | Animation |
> >              Children's | Comedy | Crime | Documentary | Drama | Fantasy
> |
> >              Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |
> >              Thriller | War | Western |
> >              The last 19 fields are the genres, a 1 indicates the movie
> >              is of that genre, a 0 indicates it is not; movies can be in
> >              several genres at once.
> >              The movie ids are the ones used in the u.data data set.
> >
> > u.genre    -- A list of the genres.
> >
> > u.user     -- Demographic information about the users; this is a tab
> >              separated list of
> >              user id | age | gender | occupation | zip code
> >              The user ids are the ones used in the u.data data set.
> >
> > u.occupation -- A list of the occupations.
> >
>

Reply via email to