Hello, just a few in-line comments regarding the simplification of vertex 
classes. 

In my opinion the proposed change might exclude all typed graphs, and all 
Sematic Web style processing from Giraph. 

On 17 Aug 2012, at 14:30, Gianmarco De Francisci Morales wrote:

> In any case, if one wanted to use a compressed memory representation by
> aggregating different edge lists together, could one use the worker context
> as a central point of access to the compressed graphs?
> I can imagine a vertex class that has only the ID and uses the worker
> context to access its edge list (i.e. it is only a client to a central
> per-machine repository).
> Vertexes in the same partition would share this data structure.

In the current vertex class signature, every user vertex can choose to have a 
complex class to hold the state of the vertex. 

Will that capability be gone with this proposed simplification of a vertex to 
only hold an id and a list of neighbour vertices? 

While most of the popular graph algorithms only take the graph itself into 
account, there are types of algorithm which also can take the semantics of the 
graph, of a node and of an edge into account. Basically everything from the 
area of Semantic Web graph analysis falls in this area, and one specific type 
of algorithm is spreading activation. 

In a nut-shell, spreading activation is a breadth first search which is guided 
by the semantics of the vertices and edges. 

An example: return all persons and posts which are somehow related to this one 
person. In addition, all vertices which are not persons or posts, and give 
twice as much weight in the ranking to properties from the music domain (all 
other properties have normal weight). 

If semantics can not be stored as part of a vertex or an edge, then this would 
require an external database lookup for each compute() call to a vertex. That 
would basically eliminate all reasons to use giraph for this kind of algorithm. 

> Is there any obvious technical fallacy in this scheme?

Not a technical fallacy, but I would argue that a lot will be lost by not 
giving developers a mechanism for including custom state in their vertex. Of 
course, developers need to be aware that this will increase the memory 
footprint of their objects, and I guess serialising/deserialising of strings 
will be a huge issue.

But that should not be a reason to completely exclude such algorithms from 
using giraph. 
Or to exclude any kind of typed graph, semantic network from using giraph. 

Reply via email to