nguyen duc tuan created SPARK-21815:
---------------------------------------

             Summary: Undeterministic  group labeling within small connected 
component
                 Key: SPARK-21815
                 URL: https://issues.apache.org/jira/browse/SPARK-21815
             Project: Spark
          Issue Type: Improvement
          Components: GraphX
    Affects Versions: 2.2.0, 1.6.3
            Reporter: nguyen duc tuan
            Priority: Trivial


As I look in the code 
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/LabelPropagation.scala#L61,
 when the number of vertices in each community is small and the number of 
iteration is large enough, all candidates will have same scores. Due to order 
in the set, each vertex will be assigned to  different community id. By 
ordering vertexId, the problem solved.

Sample code to reproduce this error:
val vertices = spark.sparkContext.parallelize(Seq((1l,1), (2l, 1)))
val edges = spark.sparkContext.parallelize(Seq(Edge(1l,2l, 1))
val c =LabelPropagation.run(g, 5)
c.vertices.map(x => (x._1, x._2)).toDF.show



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to