Making an assumption here - As netflix are being sued for releasing the dataset I bet we will find it very difficult to get hold of it in the future.
On 12 Mar 2010, at 15:13, Tamas Jambor wrote: > does anyone know if it is possible to get the netflix quiz set? they said > they would release it after the competition ends. > > T > > On 10/03/2010 04:18, Jake Mannix wrote: >> On Tue, Mar 9, 2010 at 7:49 PM, Robin Anil<[email protected]> wrote: >> >> >>> http://warsteiner.db.cs.cmu.edu/db-site/Datasets/graphData/ >>> Seems like there are plenty of interesting datasets here to try mahout on. >>> There is even a p2p network graph. 790MB compressed Sounds like a good test >>> matrix for the decomposer stuff >>> >>> >> Three words: twitter social graph: >> http://an.kaist.ac.kr/traces/WWW2010.html >> 6GB compressed, 60M x 60M sparse matrix. >> >> I've pulled the torrent and will put sequence files of vectors in some s3 >> buckets >> once I get them processed. This is a matrix with a good 1.47B nonzero >> entries, and >> is publically available. Not record breaking, but pretty darn huge. >> >> -jake >> >>
