[ https://issues.apache.org/jira/browse/GIRAPH-115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175700#comment-13175700 ]
jirapos...@reviews.apache.org commented on GIRAPH-115: ------------------------------------------------------ ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3313/ ----------------------------------------------------------- Review request for giraph. Summary ------- Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest vertex id in its component. I created a very memory-efficient abstract vertex in org.apache.giraph.graph.IntIntNullIntVertex and had org.apache.giraph.examples.ConnectedComponentsVertex extend that. org.apache.giraph.examples.ConnectedComponentsVertexTest contains an "integration" test on toy data. I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use of combiners and to shutdown() the local zookeeper instance in the tests. Local and pseudo-distributed unit tests were passed. I also tested the algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph (5.7M vertices, 130M edges). This addresses bug GIRAPH-115. https://issues.apache.org/jira/browse/GIRAPH-115 Diffs ----- /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java PRE-CREATION /trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java PRE-CREATION /trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java PRE-CREATION /trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java PRE-CREATION /trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java PRE-CREATION /trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1222837 /trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java PRE-CREATION /trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java PRE-CREATION /trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java PRE-CREATION Diff: https://reviews.apache.org/r/3313/diff Testing ------- Thanks, Sebastian > Port of the HCC algorithm for identifying all connected components of a graph > ----------------------------------------------------------------------------- > > Key: GIRAPH-115 > URL: https://issues.apache.org/jira/browse/GIRAPH-115 > Project: Giraph > Issue Type: New Feature > Affects Versions: 0.70.0 > Reporter: Sebastian Schelter > > Port of the HCC algorithm that identifies connected components and assigns a > componented id (the smallest vertex id in the component) to each vertex. > The idea behind the algorithm is very simple: propagate the smallest vertex > id along the edges to all vertices of a connected component until > convergence. The number of supersteps necessary is equal to the length of the > maximum diameter of all components + 1 > The original Hadoop-based variant of this algorithm was proposed by Kang, > Charalampos, Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale > Graphs", 2010 > http://www.cs.cmu.edu/~ukang/papers/PegasusKAIS.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira