Sorry if this is in the docs someplace and I'm missing it.

I'm trying to implement label propagation in GraphX.  The core step of that
algorithm is 

- for each vertex, find the most frequent label among its neighbors and set
its label to that.

(I think) I see how to get the input from all the neighbors, but I don't see
how to group/reduce those to find the most frequent label.  

var G2 = G.mapVertices((id,attr) => id)
val perSrcCount: VertexRDD[(Long, Long)] = G2.mapReduceTriplets[(Long,
Long)](
  edge => Iterator((edge.dstAttr, (edge.srcAttr, 1))),
  (a,b) => ((a._1), (a._2 + b._2))       // this line seems broken
  )

It seems on the "broken" line above, I don't want to reduce all the values
to a scalar, as this code does, but rather group them first and then reduce
them.  Can I do that all within mapReduceTriples?  If not, how do I build
something that I can then further reduce?  

Thanks in advance.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-group-within-the-messages-at-a-vertex-tp14468.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to