On Tue, Mar 9, 2010 at 7:49 PM, Robin Anil <[email protected]> wrote:
> http://warsteiner.db.cs.cmu.edu/db-site/Datasets/graphData/ > Seems like there are plenty of interesting datasets here to try mahout on. > There is even a p2p network graph. 790MB compressed Sounds like a good test > matrix for the decomposer stuff > Three words: twitter social graph: http://an.kaist.ac.kr/traces/WWW2010.html 6GB compressed, 60M x 60M sparse matrix. I've pulled the torrent and will put sequence files of vectors in some s3 buckets once I get them processed. This is a matrix with a good 1.47B nonzero entries, and is publically available. Not record breaking, but pretty darn huge. -jake
