These guys: http://delivery.acm.org/10.1145/1460000/1459718/a18-vigna.pdf?key1=1459718&key2=4070317621&coll=GUIDE&dl=GUIDE&CFID=77555530&CFTOKEN=13940667
say this: > We present experiments over a collection with 3.6 billions of postings---two orders of magnitudes larger than any published experiment in the literature. My impression is that Mahout on about 100 machines is ready to break this record with Jake's latest code. The stochastic decomposition should make it even more plausible. The hardest part will be to find reasonable data with > 4 billion non-zero entries. At 0.01% sparsity, this is roughly a square matrix with 5 million rows and columns. Jake, your social graph should be much larger than that. -- Ted Dunning, CTO DeepDyve