I like the idea of keeping the messages out of the vertices there is a lot of unneeded data copying/GC going on and if this eliminates some that would be fantastic and I think a big help through the whole job run, memory wise.
On Fri, Aug 3, 2012 at 4:03 AM, Gianmarco De Francisci Morales < g...@apache.org> wrote: > Hi, > > >Are you saying that out-of-core is faster that hitting memory boundaries > > >(i.e. GC)? It is a bit tough to imagine that out-of-core beats in-core > > >=). > > > > That's the only explanation I could think of, honestly it sounds wrong to > > me too. But those are the results I keep getting. If someone has a better > > one I'd love to hear it :-) > > > I am not surprised. > Streaming sequentially from a disk is faster than random reading from > memory [1]. > Add the GC overhead, and you get an explanation for your results. > > [1] The Pathologies of Big Data, > http://queue.acm.org/detail.cfm?id=1563874 > > Cheers, > -- > Gianmarco >