+1 I'm ready. What do we need. Perf Tuning! Cluster Setup?, Amazon Credits? Someone to pay for the machines or from our own pockets?
Robin On Fri, Feb 26, 2010 at 1:20 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > These guys: > > > http://delivery.acm.org/10.1145/1460000/1459718/a18-vigna.pdf?key1=1459718&key2=4070317621&coll=GUIDE&dl=GUIDE&CFID=77555530&CFTOKEN=13940667 > > say this: > > > We present experiments over a collection with 3.6 billions of > postings---two orders of magnitudes larger than any published experiment in > the literature. > > My impression is that Mahout on about 100 machines is ready to break this > record with Jake's latest code. The stochastic decomposition should make > it > even more plausible. > > The hardest part will be to find reasonable data with > 4 billion non-zero > entries. At 0.01% sparsity, this is roughly a square matrix with 5 million > rows and columns. > > Jake, your social graph should be much larger than that. > > -- > Ted Dunning, CTO > DeepDyve >