Hi, I discovered and downloaded mahout today. Maybe its just giddiness, but can you help me,
this from tutorial http://lucene.apache.org/mahout/taste.html " 1. Download the "1 Million MovieLens Dataset" from http://www.grouplens.org/. 2. Unpack the archive and copy ->movies.dat<- and ->ratings.dat<- to trunk/taste-web/src/main/resources/org/apache/mahout/cf/taste/example/grouplens under the Mahout distribution directory. " I I downloaded the MovieLens date set, there is no "movies.dat or ratings.dat". Are the correct files u.data and u.item? I haven't found any documention on file formats, there are other things confusing to new users, such as when I built the downloaded gz file, and built it with maven following the instructions , the directory was only partly built, however, when I used checked out with svn, the full diretory structure was built. Can Taste incorporate other data files, like the ones listed below, as well?, ie demographic data, etc Where can I find documentation about data file formats accepted by taste, or do I need to dig into the code? Thank you, Brian Wolf developer gOgO deVelopment, ltd Sedona, AZ u.data -- The full u data set, 100000 ratings by 943 users on 1682 items. Each user has rated at least 20 movies. Users and items are numbered consecutively from 1. The data is randomly ordered. This is a tab separated list of user id | item id | rating | timestamp. The time stamps are unix seconds since 1/1/1970 UTC u.info -- The number of users, items, and ratings in the u data set. u.item -- Information about the items (movies); this is a tab separated list of movie id | movie title | release date | video release date | IMDb URL | unknown | Action | Adventure | Animation | Children's | Comedy | Crime | Documentary | Drama | Fantasy | Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi | Thriller | War | Western | The last 19 fields are the genres, a 1 indicates the movie is of that genre, a 0 indicates it is not; movies can be in several genres at once. The movie ids are the ones used in the u.data data set. u.genre -- A list of the genres. u.user -- Demographic information about the users; this is a tab separated list of user id | age | gender | occupation | zip code The user ids are the ones used in the u.data data set. u.occupation -- A list of the occupations.
