>> I signed up for the Netflix Prize. (www.netflixprize.com) >> and downloaded their data and have imported it into PostgreSQL. >> Here is how I created the table: > > I signed up as well, but have the table as follows: > > CREATE TABLE rating ( > movie SMALLINT NOT NULL, > person INTEGER NOT NULL, > rating SMALLINT NOT NULL, > viewed DATE NOT NULL > ); > > I also recommend not loading the entire file until you get further > along in the algorithm solution. :) > > Not that I have time to really play with this....
As luck would have it, I wrote a recommendations system based on music ratings a few years ago. After reading the NYT article, it seems as though one or more of the guys behind "Net Perceptions" is either helping them or did their system, I'm not sure. I wrote my system because Net Perceptions was too slow and did a lousy job. I think the notion of "communities" in general is an interesting study in statistics, but every thing I've seen in the form of bad recommendations shows that while [N] people may share certain tastes, but that doesn't nessisarily mean that what one likes the others do. This is especially flawed with movie rentals because it is seldom a 1:1 ratio of movies to people. There are often multiple people in a household. Also, movies are almost always for multiple people. Anyway, good luck! (Not better than me, of course :-) ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match