Re: BUG: graph.triplets does not return proper values
Oh... ha, good point. Sorry, I'm new to mapreduce programming and forgot about that... I'll have to adjust my reduce function to output a vector/RDD as the element to return. Thanks for reminding me of this! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6717.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: BUG: graph.triplets does not return proper values
Wait a minute... doesn't a reduce function return 1 element PER key pair? For example, word-count mapreduce functions return a {word, count} element for every unique word. Is this supposed to be a 1-element RDD object? The .reduce function for a MappedRDD or FlatMappedRDD both are of the form def reduce(f: (T, T) = T): T So presumably if I pass the reduce function a list of values {(X,1), (X,1), (X,1), (Y,1), (Y,1)} and the function is ( (A,B) = (A._1, A._2+B._2 ) ), then I should get a final vector of {(X,3), (Y,2)}, correct? I have the following object: scala temp3 res128: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.Edge[Int], Int)] = MappedRDD[107] at map at console:27 and it contains the following: scala temp3.collect . . . res129: Array[(org.apache.spark.graphx.Edge[Int], Int)] = Array((Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(4,4,1),1), (Edge(5,4,1),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(7,4,1),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(4,5,1),1), (Edge(5,5,1),1), (Edge(1,2,1),1), (Edge(1,3,1),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(7,5,1),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(4,7,1),1), (Edge(5,7,1),1), (Edge(0,0,0),1), (E... but when I run the following, I only get one element in the final vector: scala temp3.reduce( (A,B) = (A._1, A._2+B._2 ) ) . . . res130: (org.apache.spark.graphx.Edge[Int], Int) = (Edge(0,0,0),256) I should be additionally getting { (Edge(1,2,1),1), (Edge(1,3,1),2), (Edge(2,3,1),2), (Edge(4,5,1),1), (Edge(5,6,1),2), (Edge(6,7,1),1), (Edge(4,7,1),1), (Edge(5,7,1),2) } Am I not mapping something correctly before running reduce? I've tried both .map and .flatMap, and put in _.copy() everywhere, e.g. temp3.flatMap(A = Seq(A)).reduce( (A,B) = (A._1, A._2+B._2 ) ) temp3.map(_.copy()).flatMap(A = Seq(A)).reduce( (A,B) = (A._1, A._2+B._2 ) ) etc. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6726.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: BUG: graph.triplets does not return proper values
You are probably looking for reduceByKey in that case. reduce just reduces everything in the collection into a single element. On Tue, May 20, 2014 at 12:16 PM, GlennStrycker glenn.stryc...@gmail.comwrote: Wait a minute... doesn't a reduce function return 1 element PER key pair? For example, word-count mapreduce functions return a {word, count} element for every unique word. Is this supposed to be a 1-element RDD object? The .reduce function for a MappedRDD or FlatMappedRDD both are of the form def reduce(f: (T, T) = T): T So presumably if I pass the reduce function a list of values {(X,1), (X,1), (X,1), (Y,1), (Y,1)} and the function is ( (A,B) = (A._1, A._2+B._2 ) ), then I should get a final vector of {(X,3), (Y,2)}, correct? I have the following object: scala temp3 res128: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.Edge[Int], Int)] = MappedRDD[107] at map at console:27 and it contains the following: scala temp3.collect . . . res129: Array[(org.apache.spark.graphx.Edge[Int], Int)] = Array((Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(4,4,1),1), (Edge(5,4,1),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(7,4,1),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(4,5,1),1), (Edge(5,5,1),1), (Edge(1,2,1),1), (Edge(1,3,1),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(7,5,1),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(4,7,1),1), (Edge(5,7,1),1), (Edge(0,0,0),1), (E... but when I run the following, I only get one element in the final vector: scala temp3.reduce( (A,B) = (A._1, A._2+B._2 ) ) . . . res130: (org.apache.spark.graphx.Edge[Int], Int) = (Edge(0,0,0),256) I should be additionally getting { (Edge(1,2,1),1), (Edge(1,3,1),2), (Edge(2,3,1),2), (Edge(4,5,1),1), (Edge(5,6,1),2), (Edge(6,7,1),1), (Edge(4,7,1),1), (Edge(5,7,1),2) } Am I not mapping something correctly before running reduce? I've tried both .map and .flatMap, and put in _.copy() everywhere, e.g. temp3.flatMap(A = Seq(A)).reduce( (A,B) = (A._1, A._2+B._2 ) ) temp3.map(_.copy()).flatMap(A = Seq(A)).reduce( (A,B) = (A._1, A._2+B._2 ) ) etc. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6726.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: BUG: graph.triplets does not return proper values
I don't seem to have this function in my Spark installation for this object, or the classes MappedRDD, FlatMappedRDD, EdgeRDD, VertexRDD, or Graph. Which class should have the reduceByKey function, and how do I cast my current RDD as this class? Perhaps this is still due to my Spark installation being out-of-date? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6728.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: BUG: graph.triplets does not return proper values
That's all very old functionality in Spark terms, so it shouldn't have anything to do with your installation being out-of-date. There is also no need to cast as long as the relevant implicit conversions are in scope: import org.apache.spark.SparkContext._ On Tue, May 20, 2014 at 1:00 PM, GlennStrycker glenn.stryc...@gmail.comwrote: I don't seem to have this function in my Spark installation for this object, or the classes MappedRDD, FlatMappedRDD, EdgeRDD, VertexRDD, or Graph. Which class should have the reduceByKey function, and how do I cast my current RDD as this class? Perhaps this is still due to my Spark installation being out-of-date? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6728.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: BUG: graph.triplets does not return proper values
http://spark.apache.org/docs/0.9.1/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions It becomes automagically available when your RDD contains pairs. On Tue, May 20, 2014 at 9:00 PM, GlennStrycker glenn.stryc...@gmail.com wrote: I don't seem to have this function in my Spark installation for this object, or the classes MappedRDD, FlatMappedRDD, EdgeRDD, VertexRDD, or Graph. Which class should have the reduceByKey function, and how do I cast my current RDD as this class? Perhaps this is still due to my Spark installation being out-of-date? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6728.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: BUG: graph.triplets does not return proper values
For some reason it does not appear when I hit tab in Spark shell, but when I put everything together in one line, it DOES WORK! orig_graph.edges.map(_.copy()).cartesian(orig_graph.edges.map(_.copy())).flatMap( A = Seq(if (A._1.srcId == A._2.dstId) Edge(A._2.srcId,A._1.dstId,1) else if (A._1.dstId == A._2.srcId) Edge(A._1.srcId,A._2.dstId,1) else Edge(0,0,0) ) ).map(word = (word, 1)).reduceByKey(_ + _).collect = Array((Edge(5,7,1),4), (Edge(5,6,1),4), (Edge(3,2,1),4), (Edge(5,5,1),3), (Edge(1,3,1),4), (Edge(2,3,1),4), (Edge(6,5,1),4), (Edge(5,4,1),2), (Edge(2,1,1),2), (Edge(6,7,1),2), (Edge(2,2,1),2), (Edge(7,5,1),4), (Edge(3,1,1),4), (Edge(4,5,1),2), (Edge(0,0,0),192), (Edge(3,3,1),3), (Edge(4,7,1),2), (Edge(1,2,1),2), (Edge(4,4,1),1), (Edge(6,6,1),2), (Edge(7,4,1),2), (Edge(7,6,1),2), (Edge(7,7,1),2), (Edge(1,1,1),3)) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6730.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: BUG: graph.triplets does not return proper values
This was an optimization that reuses a triplet object in GraphX, and when you do a collect directly on triplets, the same object is returned. It has been fixed in Spark 1.0 here: https://issues.apache.org/jira/browse/SPARK-1188 To work around in older version of Spark, you can add a copy step to it, e.g. graph.triplets.map(_.copy()).collect() On Mon, May 19, 2014 at 1:09 PM, GlennStrycker glenn.stryc...@gmail.comwrote: graph.triplets does not work -- it returns incorrect results I have a graph with the following edges: orig_graph.edges.collect = Array(Edge(1,4,1), Edge(1,5,1), Edge(1,7,1), Edge(2,5,1), Edge(2,6,1), Edge(3,5,1), Edge(3,6,1), Edge(3,7,1), Edge(4,1,1), Edge(5,1,1), Edge(5,2,1), Edge(5,3,1), Edge(6,2,1), Edge(6,3,1), Edge(7,1,1), Edge(7,3,1)) When I run triplets.collect, I only get the last edge repeated 16 times: orig_graph.triplets.collect = Array(((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1)) I've also tried writing various map steps first before calling the triplet function, but I get the same results as above. Similarly, the example on the graphx programming guide page (http://spark.apache.org/docs/0.9.0/graphx-programming-guide.html) is incorrect. val facts: RDD[String] = graph.triplets.map(triplet = triplet.srcAttr._1 + is the + triplet.attr + of + triplet.dstAttr._1) does not work, but val facts: RDD[String] = graph.triplets.map(triplet = triplet.srcAttr + is the + triplet.attr + of + triplet.dstAttr) does work, although the results are meaningless. For my graph example, I get the following line repeated 16 times: 1 is the 1 of 1 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: BUG: graph.triplets does not return proper values
Thanks, rxin, this worked! I am having a similar problem with .reduce... do I need to insert .copy() functions in that statement as well? This part works: orig_graph.edges.map(_.copy()).flatMap(edge = Seq(edge) ).map(edge = (Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).collect =Array((Edge(1,4,1),1), (Edge(1,5,1),1), (Edge(1,7,1),1), (Edge(2,5,1),1), (Edge(2,6,1),1), (Edge(3,5,1),1), (Edge(3,6,1),1), (Edge(3,7,1),1), (Edge(4,1,1),1), (Edge(5,1,1),1), (Edge(5,2,1),1), (Edge(5,3,1),1), (Edge(6,2,1),1), (Edge(6,3,1),1), (Edge(7,1,1),1), (Edge(7,3,1),1)) But when I try adding on a reduce statement, I only get one element, not 16: orig_graph.edges.map(_.copy()).flatMap(edge = Seq(edge) ).map(edge = (Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).reduce( (A,B) = { if (A._1.dstId == B._1.srcId) (Edge(A._1.srcId, B._1.dstId, 2), 1) else if (A._1.srcId == B._1.dstId) (Edge(B._1.srcId, A._1.dstId, 2), 1) else (Edge(0, 0, 0), 0) } ) =(Edge(0,0,0),0) -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6695.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: BUG: graph.triplets does not return proper values
I tried adding .copy() everywhere, but still only get one element returned, not even an RDD object. orig_graph.edges.map(_.copy()).flatMap(edge = Seq(edge) ).map(edge = (Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).reduce( (A,B) = { if (A._1.copy().dstId == B._1.copy().srcId) (Edge(A._1.copy().srcId, B._1.copy().dstId, 2), 1) else if (A._1.copy().srcId == B._1.copy().dstId) (Edge(B._1.copy().srcId, A._1.copy().dstId, 2), 1) else (Edge(0, 0, 3), 1) } ) = (Edge(0,0,3),1) I'll try getting a fresh copy of the Spark 1.0 code and see if I can get it to work. Thanks for your help!! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6697.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: BUG: graph.triplets does not return proper values
reduce always return a single element - maybe you are misunderstanding what the reduce function in collections does. On Mon, May 19, 2014 at 3:32 PM, GlennStrycker glenn.stryc...@gmail.comwrote: I tried adding .copy() everywhere, but still only get one element returned, not even an RDD object. orig_graph.edges.map(_.copy()).flatMap(edge = Seq(edge) ).map(edge = (Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).reduce( (A,B) = { if (A._1.copy().dstId == B._1.copy().srcId) (Edge(A._1.copy().srcId, B._1.copy().dstId, 2), 1) else if (A._1.copy().srcId == B._1.copy().dstId) (Edge(B._1.copy().srcId, A._1.copy().dstId, 2), 1) else (Edge(0, 0, 3), 1) } ) = (Edge(0,0,3),1) I'll try getting a fresh copy of the Spark 1.0 code and see if I can get it to work. Thanks for your help!! -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6697.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.