Yes, that is definitely the direction you may want to take at a certain moment. That is basically what Stanford gps does as well, and stratosphere too.
On Friday, August 17, 2012, Alessandro Presta wrote: > I think at that point it would be worth having a new logical place for > vertex/edge representation at worker- or partition-level. > Avery had some ideas about this. > > Basically right now we're giving the user the freedom (and responsibility) > to choose a representation (both in-memory and for serialization), but > another way to go would be to take care of all that at infrastructure > level and expose only one Vertex class (where the user only defines the > computation details and everything else is abstracted away). Then we could > play around with compact representations and even more disruptive > strategies (like streaming the graph/messages and re-using objects). > > On 8/17/12 2:30 PM, "Gianmarco De Francisci Morales" > <g...@apache.org<javascript:;> > > > wrote: > > >I was under the impression that 100k was the upper limit to make things > >work without crashing. > > > >In any case, if one wanted to use a compressed memory representation by > >aggregating different edge lists together, could one use the worker > >context > >as a central point of access to the compressed graphs? > >I can imagine a vertex class that has only the ID and uses the worker > >context to access its edge list (i.e. it is only a client to a central > >per-machine repository). > >Vertexes in the same partition would share this data structure. > > > >Is there any obvious technical fallacy in this scheme? > > > >Cheers, > >-- > >Gianmarco > > > > > > > >On Fri, Aug 17, 2012 at 3:18 PM, Alessandro Presta > ><alessan...@fb.com>wrote: > > > >> The example where we actually go out of memory was with 500K vertices > >>and > >> 500M edges, but yes, as a general rule we should strive to reduce our > >> memory footprint in order to push the point where we need to go out of > >> core as far away as possible. > >> > >> On 8/17/12 2:11 PM, "Gianmarco De Francisci Morales" <g...@apache.org> > >> wrote: > >> > >> >Very interesting. > >> > > >> >On a side note, a graph with 100k vertices and 100M edges is largish > >>but > >> >not that big after all. > >> >If it does not fit on 10+ GB of memory, it means that each edge > >>occupies > >> >around 100B (amortizing the cost of the vertex over the edges). > >> >In my opinion this deserves some thought. > >> >If memory is an issue, why not think about compressed memory > >>structures, > >> >at > >> >least for common graph formats? > >> > > >> >Cheers, > >> >-- > >> >Gianmarco > >> > > >> > > >> > > >> >On Wed, Aug 15, 2012 at 11:20 PM, Eli Reisman > >> ><initialcont...@gmail.com>wrote: > >> > > >> >> Great metrics, this made a very interesting read, and great code too > >>as > >> >> always. This must have been a lot of work. I like the idea of > >> >>eliminating > >> >> the extra temporary storage data structures where possible, even when > >> >>not > >> >> going out-of-core. I think that + avoiding extra object creation > >>during > >> >>the > >> >> workflow can still do a lot for in-core job's memory profile, but > >>this > >> >>is > >> >> looking really good and sounds like with the config options its also > >> >> pluggable depending on your hardware situation, so it sounds great to > >> >>me. > >> >> Great work! > >> >> > >> >> On Wed, Aug 15, 2012 at 12:23 PM, Alessandro Presta (JIRA) > >> >> <j...@apache.org>wrote: > >> >> > >> >> > > >> >> > [ > >> >> > > >> >> > >> > >>>> > https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jir > >>>>a > >> . > >> > >>>>plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435437 > >>>>#c > >> >>omment-13435437 > >> >> ] > >> >> > > >> >> > Alessandro Presta commented on GIRAPH-249: > >> >> > ------------------------------------------ > >> >> > > >> >> > Thanks Claudio, good observation. > >> >> > You got me curious so I quickly ran a shortest paths benchmark. > >> >> > > >> >> > 500k vertices, 100 edges/vertex, 10 workers > >> >> > > >> >> > This is with trunk: > >> >> > > >> >> > {code} > >> >> > hadoop jar giraph-trunk.jar > >> >> > org.apache.giraph.benchmark.ShortestPathsBenchmark > >> >>-Dgiraph.useN -- Claudio Martella claudio.marte...@gmail.com