Hello, just a few in-line comments regarding the simplification of vertex classes.
In my opinion the proposed change might exclude all typed graphs, and all Sematic Web style processing from Giraph. On 17 Aug 2012, at 14:30, Gianmarco De Francisci Morales wrote: > In any case, if one wanted to use a compressed memory representation by > aggregating different edge lists together, could one use the worker context > as a central point of access to the compressed graphs? > I can imagine a vertex class that has only the ID and uses the worker > context to access its edge list (i.e. it is only a client to a central > per-machine repository). > Vertexes in the same partition would share this data structure. In the current vertex class signature, every user vertex can choose to have a complex class to hold the state of the vertex. Will that capability be gone with this proposed simplification of a vertex to only hold an id and a list of neighbour vertices? While most of the popular graph algorithms only take the graph itself into account, there are types of algorithm which also can take the semantics of the graph, of a node and of an edge into account. Basically everything from the area of Semantic Web graph analysis falls in this area, and one specific type of algorithm is spreading activation. In a nut-shell, spreading activation is a breadth first search which is guided by the semantics of the vertices and edges. An example: return all persons and posts which are somehow related to this one person. In addition, all vertices which are not persons or posts, and give twice as much weight in the ranking to properties from the music domain (all other properties have normal weight). If semantics can not be stored as part of a vertex or an edge, then this would require an external database lookup for each compute() call to a vertex. That would basically eliminate all reasons to use giraph for this kind of algorithm. > Is there any obvious technical fallacy in this scheme? Not a technical fallacy, but I would argue that a lot will be lost by not giving developers a mechanism for including custom state in their vertex. Of course, developers need to be aware that this will increase the memory footprint of their objects, and I guess serialising/deserialising of strings will be a huge issue. But that should not be a reason to completely exclude such algorithms from using giraph. Or to exclude any kind of typed graph, semantic network from using giraph.
