These guys:

http://delivery.acm.org/10.1145/1460000/1459718/a18-vigna.pdf?key1=1459718&key2=4070317621&coll=GUIDE&dl=GUIDE&CFID=77555530&CFTOKEN=13940667

say this:

   > We present experiments over a collection with 3.6 billions of
postings---two orders of magnitudes larger than any published experiment in
the literature.

My impression is that Mahout on about 100 machines is ready to break this
record with Jake's latest code.  The stochastic decomposition should make it
even more plausible.

The hardest part will be to find reasonable data with > 4 billion non-zero
entries.  At 0.01% sparsity, this is roughly a square matrix with 5 million
rows and columns.

Jake, your social graph should be much larger than that.

-- 
Ted Dunning, CTO
DeepDyve

Reply via email to