[ https://issues.apache.org/jira/browse/SPARK-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002358#comment-14002358 ]
Glenn Strycker commented on SPARK-1883: --------------------------------------- Sorry, this has been fixed -- https://issues.apache.org/jira/browse/SPARK-1188 Thanks to rxin for pointing this out on my email list question http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-td6693.html ----- This was an optimization that reuses a triplet object in GraphX, and when you do a collect directly on triplets, the same object is returned. It has been fixed in Spark 1.0 here: https://issues.apache.org/jira/browse/SPARK-1188 To work around in older version of Spark, you can add a copy step to it, e.g. graph.triplets.map(_.copy()).collect() > spark graph.triplets does not return correct values > --------------------------------------------------- > > Key: SPARK-1883 > URL: https://issues.apache.org/jira/browse/SPARK-1883 > Project: Spark > Issue Type: Bug > Reporter: Glenn Strycker > Original Estimate: 24h > Remaining Estimate: 24h > > graph.triplets does not work -- it returns incorrect results > I have a graph with the following edges: > orig_graph.edges.collect > = Array(Edge(1,4,1), Edge(1,5,1), Edge(1,7,1), Edge(2,5,1), Edge(2,6,1), > Edge(3,5,1), Edge(3,6,1), Edge(3,7,1), Edge(4,1,1), Edge(5,1,1), Edge(5,2,1), > Edge(5,3,1), Edge(6,2,1), Edge(6,3,1), Edge(7,1,1), Edge(7,3,1)) > When I run triplets.collect, I only get the last edge repeated 16 times: > orig_graph.triplets.collect > = Array(((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), > ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), > ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), > ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1)) > I've also tried writing various map steps first before calling the triplet > function, but I get the same results as above. > Similarly, the example on the graphx programming guide page > (http://spark.apache.org/docs/0.9.0/graphx-programming-guide.html) is > incorrect. > val facts: RDD[String] = > graph.triplets.map(triplet => > triplet.srcAttr._1 + " is the " + triplet.attr + " of " + > triplet.dstAttr._1) > does not work, but > val facts: RDD[String] = > graph.triplets.map(triplet => > triplet.srcAttr + " is the " + triplet.attr + " of " + triplet.dstAttr) > does work, although the results are meaningless. For my graph example, I get > the following line repeated 16 times: > 1 is the 1 of 1 -- This message was sent by Atlassian JIRA (v6.2#6252)