Re: GraphX create graph with multiple node attributes

2015-09-26 Thread JJ
Robineast wrote
> 2) let GraphX supply a null instead
>  val graph = Graph(vertices, edges)  // vertices found in 'edges' but
> not in 'vertices' will be set to null 

Thank you! This method works.

As a follow up (sorry I'm new to this, don't know if I should start a new
thread?): if I have vertices that are in 'vertices' but not in 'edges' (the
opposite of what you mention), will they be counted as part of the graph
but with 0 edges, or will they be dropped from the graph? When I count the
number of vertices with vertices.count, I get 13,628 nodes. When I count
graph vertices with graph.vertices.count, I get 12,274 nodes. When I count
vertices with 1+ degrees with graph.degrees.count I get 10,091 vertices...
What am I dropping each time?

Thanks again!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-create-graph-with-multiple-node-attributes-tp24827p24830.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: GraphX create graph with multiple node attributes

2015-09-26 Thread JJ
Here is all of my code. My first post had a simplified version. As I post
this, I realize one issue may be that when I convert my Ids to long (I
define a pageHash function to convert string Ids to long), the nodeIds are
no longer the same between the 'vertices' object and the 'edges' object. Do
you think this is what is causing the issue?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-create-graph-with-multiple-node-attributes-tp24827p24832.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: GraphX create graph with multiple node attributes

2015-09-26 Thread Robineast
Vertices that aren't connected to anything are perfectly valid e.g.

import org.apache.spark.graphx._

val vertices = sc.makeRDD(Seq((1L,1),(2L,1),(3L,1)))
val edges = sc.makeRDD(Seq(Edge(1L,2L,1)))

val g = Graph(vertices, edges)
g.vertices.count

gives 3

Not sure why vertices appear to be dropping off. Could you show your full
code.

g.degrees.count gives 2 - as the scaladocs mention 'The degree of each
vertex in the graph. @note Vertices with no edges are not returned in the
resulting RDD'






-
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-create-graph-with-multiple-node-attributes-tp24827p24831.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: GraphX create graph with multiple node attributes

2015-09-26 Thread Nick Peterson
Have you checked to make sure that your hashing function doesn't have any
collisions?  Node ids have to be unique; so, if you're getting repeated ids
out of your hasher, it could certainly lead to dropping of duplicate ids,
and therefore loss of vertices.

On Sat, Sep 26, 2015 at 10:37 AM JJ  wrote:

> Here is all of my code. My first post had a simplified version. As I post
> this, I realize one issue may be that when I convert my Ids to long (I
> define a pageHash function to convert string Ids to long), the nodeIds are
> no longer the same between the 'vertices' object and the 'edges' object. Do
> you think this is what is causing the issue?
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-create-graph-with-multiple-node-attributes-tp24827p24832.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>