Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread GlennStrycker
Oh... ha, good point.  Sorry, I'm new to mapreduce programming and forgot
about that... I'll have to adjust my reduce function to output a vector/RDD
as the element to return.  Thanks for reminding me of this!



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6717.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread GlennStrycker
Wait a minute... doesn't a reduce function return 1 element PER key pair? 
For example, word-count mapreduce functions return a {word, count} element
for every unique word.  Is this supposed to be a 1-element RDD object?

The .reduce function for a MappedRDD or FlatMappedRDD both are of the form

def reduce(f: (T, T) = T): T

So presumably if I pass the reduce function a list of values {(X,1), (X,1),
(X,1), (Y,1), (Y,1)} and the function is ( (A,B) = (A._1, A._2+B._2 ) ),
then I should get a final vector of {(X,3), (Y,2)}, correct?


I have the following object:

scala temp3
res128: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.Edge[Int],
Int)] = MappedRDD[107] at map at console:27

and it contains the following:

scala temp3.collect
. . .
res129: Array[(org.apache.spark.graphx.Edge[Int], Int)] =
Array((Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
(Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
(Edge(4,4,1),1), (Edge(5,4,1),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
(Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(7,4,1),1), (Edge(0,0,0),1),
(Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
(Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
(Edge(4,5,1),1), (Edge(5,5,1),1), (Edge(1,2,1),1), (Edge(1,3,1),1),
(Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(7,5,1),1), (Edge(0,0,0),1),
(Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
(Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
(Edge(4,7,1),1), (Edge(5,7,1),1), (Edge(0,0,0),1), (E...

but when I run the following, I only get one element in the final vector:

scala temp3.reduce( (A,B) = (A._1, A._2+B._2 ) )
. . .
res130: (org.apache.spark.graphx.Edge[Int], Int) = (Edge(0,0,0),256)

I should be additionally getting { (Edge(1,2,1),1), (Edge(1,3,1),2),
(Edge(2,3,1),2), (Edge(4,5,1),1), (Edge(5,6,1),2), (Edge(6,7,1),1),
(Edge(4,7,1),1), (Edge(5,7,1),2) }



Am I not mapping something correctly before running reduce?  I've tried both
.map and .flatMap, and put in _.copy() everywhere, e.g.

temp3.flatMap(A = Seq(A)).reduce( (A,B) = (A._1, A._2+B._2 ) )
temp3.map(_.copy()).flatMap(A = Seq(A)).reduce( (A,B) = (A._1, A._2+B._2 )
)
etc.





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6726.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread Reynold Xin
You are probably looking for reduceByKey in that case.

reduce just reduces everything in the collection into a single element.


On Tue, May 20, 2014 at 12:16 PM, GlennStrycker glenn.stryc...@gmail.comwrote:

 Wait a minute... doesn't a reduce function return 1 element PER key pair?
 For example, word-count mapreduce functions return a {word, count} element
 for every unique word.  Is this supposed to be a 1-element RDD object?

 The .reduce function for a MappedRDD or FlatMappedRDD both are of the form

 def reduce(f: (T, T) = T): T

 So presumably if I pass the reduce function a list of values {(X,1), (X,1),
 (X,1), (Y,1), (Y,1)} and the function is ( (A,B) = (A._1, A._2+B._2 ) ),
 then I should get a final vector of {(X,3), (Y,2)}, correct?


 I have the following object:

 scala temp3
 res128: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.Edge[Int],
 Int)] = MappedRDD[107] at map at console:27

 and it contains the following:

 scala temp3.collect
 . . .
 res129: Array[(org.apache.spark.graphx.Edge[Int], Int)] =
 Array((Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
 (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
 (Edge(4,4,1),1), (Edge(5,4,1),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
 (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(7,4,1),1), (Edge(0,0,0),1),
 (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
 (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
 (Edge(4,5,1),1), (Edge(5,5,1),1), (Edge(1,2,1),1), (Edge(1,3,1),1),
 (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(7,5,1),1), (Edge(0,0,0),1),
 (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
 (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1), (Edge(0,0,0),1),
 (Edge(4,7,1),1), (Edge(5,7,1),1), (Edge(0,0,0),1), (E...

 but when I run the following, I only get one element in the final vector:

 scala temp3.reduce( (A,B) = (A._1, A._2+B._2 ) )
 . . .
 res130: (org.apache.spark.graphx.Edge[Int], Int) = (Edge(0,0,0),256)

 I should be additionally getting { (Edge(1,2,1),1), (Edge(1,3,1),2),
 (Edge(2,3,1),2), (Edge(4,5,1),1), (Edge(5,6,1),2), (Edge(6,7,1),1),
 (Edge(4,7,1),1), (Edge(5,7,1),2) }



 Am I not mapping something correctly before running reduce?  I've tried
 both
 .map and .flatMap, and put in _.copy() everywhere, e.g.

 temp3.flatMap(A = Seq(A)).reduce( (A,B) = (A._1, A._2+B._2 ) )
 temp3.map(_.copy()).flatMap(A = Seq(A)).reduce( (A,B) = (A._1, A._2+B._2
 )
 )
 etc.





 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6726.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.



Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread GlennStrycker
I don't seem to have this function in my Spark installation for this object,
or the classes MappedRDD, FlatMappedRDD, EdgeRDD, VertexRDD, or Graph.

Which class should have the reduceByKey function, and how do I cast my
current RDD as this class?

Perhaps this is still due to my Spark installation being out-of-date?



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6728.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread Mark Hamstra
That's all very old functionality in Spark terms, so it shouldn't have
anything to do with your installation being out-of-date.  There is also no
need to cast as long as the relevant implicit conversions are in scope:
import org.apache.spark.SparkContext._


On Tue, May 20, 2014 at 1:00 PM, GlennStrycker glenn.stryc...@gmail.comwrote:

 I don't seem to have this function in my Spark installation for this
 object,
 or the classes MappedRDD, FlatMappedRDD, EdgeRDD, VertexRDD, or Graph.

 Which class should have the reduceByKey function, and how do I cast my
 current RDD as this class?

 Perhaps this is still due to my Spark installation being out-of-date?



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6728.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.



Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread Sean Owen
http://spark.apache.org/docs/0.9.1/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions

It becomes automagically available when your RDD contains pairs.

On Tue, May 20, 2014 at 9:00 PM, GlennStrycker glenn.stryc...@gmail.com wrote:
 I don't seem to have this function in my Spark installation for this object,
 or the classes MappedRDD, FlatMappedRDD, EdgeRDD, VertexRDD, or Graph.

 Which class should have the reduceByKey function, and how do I cast my
 current RDD as this class?

 Perhaps this is still due to my Spark installation being out-of-date?



 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6728.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: BUG: graph.triplets does not return proper values

2014-05-20 Thread GlennStrycker
For some reason it does not appear when I hit tab in Spark shell, but when
I put everything together in one line, it DOES WORK!

orig_graph.edges.map(_.copy()).cartesian(orig_graph.edges.map(_.copy())).flatMap(
A = Seq(if (A._1.srcId == A._2.dstId) Edge(A._2.srcId,A._1.dstId,1) else if
(A._1.dstId == A._2.srcId) Edge(A._1.srcId,A._2.dstId,1) else Edge(0,0,0) )
).map(word = (word, 1)).reduceByKey(_ + _).collect

= Array((Edge(5,7,1),4), (Edge(5,6,1),4), (Edge(3,2,1),4), (Edge(5,5,1),3),
(Edge(1,3,1),4), (Edge(2,3,1),4), (Edge(6,5,1),4), (Edge(5,4,1),2),
(Edge(2,1,1),2), (Edge(6,7,1),2), (Edge(2,2,1),2), (Edge(7,5,1),4),
(Edge(3,1,1),4), (Edge(4,5,1),2), (Edge(0,0,0),192), (Edge(3,3,1),3),
(Edge(4,7,1),2), (Edge(1,2,1),2), (Edge(4,4,1),1), (Edge(6,6,1),2),
(Edge(7,4,1),2), (Edge(7,6,1),2), (Edge(7,7,1),2), (Edge(1,1,1),3))




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6730.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread Reynold Xin
This was an optimization that reuses a triplet object in GraphX, and when
you do a collect directly on triplets, the same object is returned.

It has been fixed in Spark 1.0 here:
https://issues.apache.org/jira/browse/SPARK-1188

To work around in older version of Spark, you can add a copy step to it,
e.g.

graph.triplets.map(_.copy()).collect()



On Mon, May 19, 2014 at 1:09 PM, GlennStrycker glenn.stryc...@gmail.comwrote:

 graph.triplets does not work -- it returns incorrect results

 I have a graph with the following edges:

 orig_graph.edges.collect
 =  Array(Edge(1,4,1), Edge(1,5,1), Edge(1,7,1), Edge(2,5,1), Edge(2,6,1),
 Edge(3,5,1), Edge(3,6,1), Edge(3,7,1), Edge(4,1,1), Edge(5,1,1),
 Edge(5,2,1), Edge(5,3,1), Edge(6,2,1), Edge(6,3,1), Edge(7,1,1),
 Edge(7,3,1))

 When I run triplets.collect, I only get the last edge repeated 16 times:

 orig_graph.triplets.collect
 = Array(((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1),
 ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1),
 ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1),
 ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1), ((7,1),(3,1),1))

 I've also tried writing various map steps first before calling the triplet
 function, but I get the same results as above.

 Similarly, the example on the graphx programming guide page
 (http://spark.apache.org/docs/0.9.0/graphx-programming-guide.html) is
 incorrect.

 val facts: RDD[String] =
   graph.triplets.map(triplet =
 triplet.srcAttr._1 +  is the  + triplet.attr +  of  +
 triplet.dstAttr._1)

 does not work, but

 val facts: RDD[String] =
   graph.triplets.map(triplet =
 triplet.srcAttr +  is the  + triplet.attr +  of  + triplet.dstAttr)

 does work, although the results are meaningless.  For my graph example, I
 get the following line repeated 16 times:

 1 is the 1 of 1



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.



Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread GlennStrycker
Thanks, rxin, this worked!

I am having a similar problem with .reduce... do I need to insert .copy()
functions in that statement as well?

This part works:
orig_graph.edges.map(_.copy()).flatMap(edge = Seq(edge) ).map(edge =
(Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).collect

=Array((Edge(1,4,1),1), (Edge(1,5,1),1), (Edge(1,7,1),1), (Edge(2,5,1),1),
(Edge(2,6,1),1), (Edge(3,5,1),1), (Edge(3,6,1),1), (Edge(3,7,1),1),
(Edge(4,1,1),1), (Edge(5,1,1),1), (Edge(5,2,1),1), (Edge(5,3,1),1),
(Edge(6,2,1),1), (Edge(6,3,1),1), (Edge(7,1,1),1), (Edge(7,3,1),1))

But when I try adding on a reduce statement, I only get one element, not 16:
orig_graph.edges.map(_.copy()).flatMap(edge = Seq(edge) ).map(edge =
(Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).reduce(
(A,B) = { if (A._1.dstId == B._1.srcId) (Edge(A._1.srcId, B._1.dstId, 2),
1) else if (A._1.srcId == B._1.dstId) (Edge(B._1.srcId, A._1.dstId, 2), 1)
else (Edge(0, 0, 0), 0) } )

=(Edge(0,0,0),0)



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6695.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread GlennStrycker
I tried adding .copy() everywhere, but still only get one element returned,
not even an RDD object.

orig_graph.edges.map(_.copy()).flatMap(edge = Seq(edge) ).map(edge =
(Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).reduce(
(A,B) = { if (A._1.copy().dstId == B._1.copy().srcId)
(Edge(A._1.copy().srcId, B._1.copy().dstId, 2), 1) else if
(A._1.copy().srcId == B._1.copy().dstId) (Edge(B._1.copy().srcId,
A._1.copy().dstId, 2), 1) else (Edge(0, 0, 3), 1) } )

= (Edge(0,0,3),1)

I'll try getting a fresh copy of the Spark 1.0 code and see if I can get it
to work.  Thanks for your help!!



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6697.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.


Re: BUG: graph.triplets does not return proper values

2014-05-19 Thread Reynold Xin
reduce always return a single element - maybe you are misunderstanding what
the reduce function in collections does.


On Mon, May 19, 2014 at 3:32 PM, GlennStrycker glenn.stryc...@gmail.comwrote:

 I tried adding .copy() everywhere, but still only get one element returned,
 not even an RDD object.

 orig_graph.edges.map(_.copy()).flatMap(edge = Seq(edge) ).map(edge =
 (Edge(edge.copy().srcId, edge.copy().dstId, edge.copy().attr), 1)).reduce(
 (A,B) = { if (A._1.copy().dstId == B._1.copy().srcId)
 (Edge(A._1.copy().srcId, B._1.copy().dstId, 2), 1) else if
 (A._1.copy().srcId == B._1.copy().dstId) (Edge(B._1.copy().srcId,
 A._1.copy().dstId, 2), 1) else (Edge(0, 0, 3), 1) } )

 = (Edge(0,0,3),1)

 I'll try getting a fresh copy of the Spark 1.0 code and see if I can get it
 to work.  Thanks for your help!!



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/BUG-graph-triplets-does-not-return-proper-values-tp6693p6697.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.