error accessing vertexRDD
Hi all, question on an issue im having with a vertexRDD. If i kick of my spark shell with something like this: then run: it will finish and give me the count but is see a few errors (see below). This is okay for this small dataset but when trying with a large data set it doesnt finish because of the number of errors. this works okay if i kick of my spark shell with master = local. Any help appreciated -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-accessing-vertexRDD-tp24466.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: graphx class not found error
Oh forgot to note using the Scala REPL for this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/graphx-class-not-found-error-tp24253p24254.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
graphx class not found error
the code below works perfectly on both cluster and local modes but when i try to create a graph in cluster mode (it works in local mode) I get the following error: any help appreciated -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/graphx-class-not-found-error-tp24253.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
pregel graphx job not finishing
Hi im currently using a pregel message passing function for my graph in spark and graphx. The problem i have is that the code runs perfectly on spark 1.0 and finishes in a couple of minutes but as we have upgraded now im trying to run the same code on 1.3 but it doesnt finish (left it overnight and it was still going) and get a lot of messages as follows (doesnt happen in v1.0). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pregel-graphx-job-not-finishing-tp24221.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: flatmapping with other data
Sorry cut and paste error, the resulting data set i want is this: ({(101,S)=3},piece_of_data_1)) ({(101,S)=3},piece_of_data_2)) ({(101,S)=1},piece_of_data_3)) ({(109,S)=2},piece_of_data_3)) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/flatmapping-with-other-data-tp23324p23325.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
propogating edges
Hi all looking for some help in propagating some values in edges. What i want to achieve (see diagram) is for each connected part of the graph assign an incrementing value for each of the out links from the root node. This value will restart again for the next part of the graph. ie node 1 has out links to node 2,3,and 4. The edge attribute for these will be 1,2 and 3 respectively. For each of the out links from these nodes they keep this value right through to the final node in their path. For node 9 with out link to 10 this has an edge attribute of 1 etc etc. Thanks in advance for any help :) dave http://apache-spark-user-list.1001560.n3.nabble.com/file/n21086/Drawing2.jpg -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/propogating-edges-tp21086.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
equivalent to sql in
i have and RDD i want to filter and for a single term all works good: ie dataRDD.filter(x=x._2 ==apple) how can i use multiple values, for example if i wanted to filter my rdd to take out apples and oranges and pears with out using . This could get long winded as there may be quite a few. Can you filter using a set or a list? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/equivalent-to-sql-in-tp20599.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
graph x extracting the path
Hi all, just wondering if there was a way to extract paths in graphx. For example if i have the graph attached i would like to return the results along the lines of : 101 - 103 101 -104 -108 102 -105 102 -106-107 http://apache-spark-user-list.1001560.n3.nabble.com/file/n17936/graph.jpg -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/graph-x-extracting-the-path-tp17936.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
kryos serializer
Hi all, how can i tell if my kryos serializer is actually working. I have a class which extends Serializable and i have included the following imports: import com.esotericsoftware.kryo.Kryo import org.apache.spark.serializer.KryoRegistrator i also have class included MyRegistrator extends KryoRegistrator { override def registerClasses(kryo: Kryo) { kryo.register(classOf[Person]) } } my configuration includes spark.serializerorg.apache.spark.serializer.KryoSerializer but not sure if its working (code runs fine though, although parts using larger data sets pretty slow). I see in the documentation i should use conf.set(spark.kryo.registrator, mypackage.MyRegistrator) but as im using the scala repl not sure where this should go or what format should be in which leads me to wondering if im using the default java serialization. Is there a way i can tell, and how to I go about including the myRegistrator bit above. Thanks in advance for any help D -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/kryos-serializer-tp16454.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
creating a subgraph with an edge predicate
Im currently creating a subgraph using the vertex predicate: subgraph(vpred = (vid,attr) = attr.split(,)(2)!=999) but wondering if a subgraph can be created using the edge predicate, if so a sample would be great :) thanks Dave -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-subgraph-with-an-edge-predicate-tp12797.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
counting degrees graphx
Hi, looking for a little help on counting the degrees in a graph. Currently my graph consists of 2 subgraphs the and it looks like this: val vertexArray = Array( (1L,(101,x)), (2L,(102,y)), (3L,(103,y)), (4L,(104,y)), (5L,(105,y)), (6L,(106,x)), (7L,(107,x)), (8L,(108,y)) ) val edgeArray = Array( Edge(1L,2L,1), Edge(1L,3L,2), Edge(3L,4L,3), Edge(3L,5L,4), Edge(6L,5L,5), Edge(7L,8L,6) ) no i can summarize the graphs using connected components as such: val cc = userGraph.connectedComponents userGraph.vertices.leftJoin(cc.vertices) { case(id,tfn,ent) = s$tfn is in component $ent}.collect.foreach{ case (id,str) = println(str) } and i get the results i expect (101,SH,0,2) is in component Some(1) (102,COY,1,0) is in component Some(1) (103,COY,1,2) is in component Some(1) (104,COY,1,0) is in component Some(1) (105,COY,2,0) is in component Some(1) (106,SH,0,1) is in component Some(1) (107,SH,0,1) is in component Some(7) (108,COY,1,0) is in component Some(7) this essentially gives me two clusters with the root nodes for each as ID 1 and ID 7. what i really want to do is identify the third cluster that is built into this graph (the connection between 106 and 105 as another cluster. My results would identify that 105 was in cluster 1 and the new cluster. any help appreciated. cheers -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/counting-degrees-graphx-tp6370.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: counting degrees graphx
yes thats correct I want the vertex set for each source vertice in the graph. Which of course leads me on to my next question is to add a level to each of these. http://apache-spark-user-list.1001560.n3.nabble.com/file/n6383/image1.jpg For example the image shows the in and out links of the graph and shows my structure. i want the list of vertices in 1,6 and 7. i need to show that vertice 1 has members 2,3,4,5 and vertice 6 has members 5 and vertice 7 has members 8. Ideally i would also like to go one step further and identify which level each vertice is on ie vertices 1,6 and 7 are level 0, verices 2,3,and and are level 1, vertices 4 and 5 are level 2 (5 is also a level 1 when looked at through vertice 6). Hope that is clearer. cheers -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/counting-degrees-graphx-tp6370p6383.html Sent from the Apache Spark User List mailing list archive at Nabble.com.