Sorry if this is in the docs someplace and I'm missing it. I'm trying to implement label propagation in GraphX. The core step of that algorithm is
- for each vertex, find the most frequent label among its neighbors and set its label to that. (I think) I see how to get the input from all the neighbors, but I don't see how to group/reduce those to find the most frequent label. var G2 = G.mapVertices((id,attr) => id) val perSrcCount: VertexRDD[(Long, Long)] = G2.mapReduceTriplets[(Long, Long)]( edge => Iterator((edge.dstAttr, (edge.srcAttr, 1))), (a,b) => ((a._1), (a._2 + b._2)) // this line seems broken ) It seems on the "broken" line above, I don't want to reduce all the values to a scalar, as this code does, but rather group them first and then reduce them. Can I do that all within mapReduceTriples? If not, how do I build something that I can then further reduce? Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-group-within-the-messages-at-a-vertex-tp14468.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org