nguyen duc tuan created SPARK-21815: ---------------------------------------
Summary: Undeterministic group labeling within small connected component Key: SPARK-21815 URL: https://issues.apache.org/jira/browse/SPARK-21815 Project: Spark Issue Type: Improvement Components: GraphX Affects Versions: 2.2.0, 1.6.3 Reporter: nguyen duc tuan Priority: Trivial As I look in the code https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/LabelPropagation.scala#L61, when the number of vertices in each community is small and the number of iteration is large enough, all candidates will have same scores. Due to order in the set, each vertex will be assigned to different community id. By ordering vertexId, the problem solved. Sample code to reproduce this error: val vertices = spark.sparkContext.parallelize(Seq((1l,1), (2l, 1))) val edges = spark.sparkContext.parallelize(Seq(Edge(1l,2l, 1)) val c =LabelPropagation.run(g, 5) c.vertices.map(x => (x._1, x._2)).toDF.show -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org