Re: GraphX create graph with multiple node attributes
Robineast wrote > 2) let GraphX supply a null instead > val graph = Graph(vertices, edges) // vertices found in 'edges' but > not in 'vertices' will be set to null Thank you! This method works. As a follow up (sorry I'm new to this, don't know if I should start a new thread?): if I have vertices that are in 'vertices' but not in 'edges' (the opposite of what you mention), will they be counted as part of the graph but with 0 edges, or will they be dropped from the graph? When I count the number of vertices with vertices.count, I get 13,628 nodes. When I count graph vertices with graph.vertices.count, I get 12,274 nodes. When I count vertices with 1+ degrees with graph.degrees.count I get 10,091 vertices... What am I dropping each time? Thanks again! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-create-graph-with-multiple-node-attributes-tp24827p24830.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: GraphX create graph with multiple node attributes
Here is all of my code. My first post had a simplified version. As I post this, I realize one issue may be that when I convert my Ids to long (I define a pageHash function to convert string Ids to long), the nodeIds are no longer the same between the 'vertices' object and the 'edges' object. Do you think this is what is causing the issue? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-create-graph-with-multiple-node-attributes-tp24827p24832.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: GraphX create graph with multiple node attributes
Vertices that aren't connected to anything are perfectly valid e.g. import org.apache.spark.graphx._ val vertices = sc.makeRDD(Seq((1L,1),(2L,1),(3L,1))) val edges = sc.makeRDD(Seq(Edge(1L,2L,1))) val g = Graph(vertices, edges) g.vertices.count gives 3 Not sure why vertices appear to be dropping off. Could you show your full code. g.degrees.count gives 2 - as the scaladocs mention 'The degree of each vertex in the graph. @note Vertices with no edges are not returned in the resulting RDD' - Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co. http://www.manning.com/books/spark-graphx-in-action -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-create-graph-with-multiple-node-attributes-tp24827p24831.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: GraphX create graph with multiple node attributes
Have you checked to make sure that your hashing function doesn't have any collisions? Node ids have to be unique; so, if you're getting repeated ids out of your hasher, it could certainly lead to dropping of duplicate ids, and therefore loss of vertices. On Sat, Sep 26, 2015 at 10:37 AM JJwrote: > Here is all of my code. My first post had a simplified version. As I post > this, I realize one issue may be that when I convert my Ids to long (I > define a pageHash function to convert string Ids to long), the nodeIds are > no longer the same between the 'vertices' object and the 'edges' object. Do > you think this is what is causing the issue? > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-create-graph-with-multiple-node-attributes-tp24827p24832.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >