If you're willing to live with a social network graph for this example, then even bigger than LJ, but still public, is the twitter social graph, available as a torrent <http://an.kaist.ac.kr/traces/WWW2010.html>, which I've also put on S3 and just need to make public at some point. It has 1.47 billion connections on 47 million nodes.
-jake On Fri, May 7, 2010 at 8:38 AM, Sean Owen <[email protected]> wrote: > Cool, yeah I'm looking for something even larger, since this is small > enough that processing it easily fits on one computer. The chapter in > question is about distributing via Hadoop. > > My current next-best option, if it can be used, is the LiveJournal > network data here: > http://snap.stanford.edu/data/index.html > > On Fri, May 7, 2010 at 4:29 PM, Pedro Oliveira <[email protected]> wrote: > > This dataset seems to have a few million <user, artist, plays> triples > from > > last.fm: > > http://mtg.upf.edu/node/1671 >
