[ 
https://issues.apache.org/jira/browse/GIRAPH-141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Presta updated GIRAPH-141:
-------------------------------------

    Attachment: GIRAPH-141.patch

This patch introduces support for multigraphs in Giraph, by:
- changing Vertex#initialize() and Vertex#setEdges() to take a collection of 
edges instead of a map
- adding two vertex implementations (MultiGraphEdgeListVertex and 
MultiGraphRepresentativeVertex) where addEdge() only appends the edge, instead 
of checking for duplicates

This is not only useful for representing multigraphs, but also for implementing 
addEdge() efficiently (critical for edge-based input when the out-degree can be 
really large).

Other minor changes:
- changed MutableVertex#addEdge() to get an Edge instead of an id and value, 
mainly for consistency with other code
- added PseudoRandomEdgeInputFormat and extended PageRankBenchmark to accept 
edge-based input
- various fixes on the way

I ran several benchmarks to compare this patch with the current version, and 
edge-based input with vertex-based input:

Graph: 10M vertices, 1B edges, 10 workers

EdgeListVertex + VertexInputFormat, current version:
Total (milliseconds) 118,377
Input superstep (milliseconds) 44,041
Superstep 0 (milliseconds) 20,970
Superstep 1 (milliseconds) 23,731
Superstep 2 (milliseconds) 24,269

EdgeListVertex + VertexInputFormat, patched version:
Total (milliseconds) 116,298
Input superstep (milliseconds) 40,441
Superstep 0 (milliseconds) 22,444
Superstep 1 (milliseconds) 24,164
Superstep 2 (milliseconds) 25,036

MultiGraphEdgeListVertex + EdgeInputFormat, patched version:
Total (milliseconds) 148,905
Input superstep (milliseconds) 72,425
Superstep 0 (milliseconds) 23,684
Superstep 1 (milliseconds) 25,580
Superstep 2 (milliseconds) 22,456

RepresentativeVertex + VertexInputFormat, current version:
Total (milliseconds) 111,450
Input superstep (milliseconds) 39,301
Superstep 0 (milliseconds) 21,615
Superstep 1 (milliseconds) 22,213
Superstep 2 (milliseconds) 21,840

RepresentativeVertex + VertexInputFormat, patched version:
Total (milliseconds) 106,512
Input superstep (milliseconds) 38,142
Superstep 0 (milliseconds) 20,812
Superstep 1 (milliseconds) 22,906
Superstep 2 (milliseconds) 21,250

MultiGraphRepresentativeVertex + EdgeInputFormat, patched version:
Total (milliseconds) 143,831
Input superstep (milliseconds) 75,030
Superstep 0 (milliseconds) 20,661
Superstep 1 (milliseconds) 21,406
Superstep 2 (milliseconds) 22,456
                
> multigraph support in giraph
> ----------------------------
>
>                 Key: GIRAPH-141
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-141
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>            Reporter: André Kelpe
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-141.patch
>
>
> The current vertex API only supports simple graphs, meaning that there can 
> only ever be one edge between two vertices. Many graphs like the road network 
> are in fact multigraphs, where many edges can connect two vertices at the 
> same time.
> Support for this could be added by introducing an Iterator<EdgeWritable> 
> getEdgeValue() or a similar construct. Maybe introducing a slim object like a 
> Connector between the edge and the vertex is also a good idea, so that you 
> could do something like:
> {code} 
> for (final Connector<EdgeWritable, VertexWritable> conn: getEdgeValues(){
>      final EdgeWritable edge = conn.getEdge();
>      final VertexWritable otherVertex = conn.getOther();
>      doInterestingStuff(otherVertex);
>      doMoreInterestingStuff(edge);
> }
> {code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to