Re: GraphX: .edges.distinct().count() is 10?
This is caused by https://issues.apache.org/jira/browse/SPARK-1188. I think the fix will be in the next release. But until then, do: g.edges.map(_.copy()).distinct.count On Wed, Apr 23, 2014 at 2:26 AM, Ryan Compton compton.r...@gmail.comwrote: Try this: https://www.dropbox.com/s/xf34l0ta496bdsn/.txt This code: println(g.numEdges) println(g.numVertices) println(g.edges.distinct().count()) gave me 1 9294 2 On Tue, Apr 22, 2014 at 5:14 PM, Ankur Dave ankurd...@gmail.com wrote: I wasn't able to reproduce this with a small test file, but I did change the file parsing to use x(1).toLong instead of x(2).toLong. Did you mean to take the third column rather than the second? If so, would you mind posting a larger sample of the file, or even the whole file if possible? Here's the test that succeeded: test(graph.edges.distinct.count) { withSpark { sc = val edgeFullStrRDD: RDD[String] = sc.parallelize(List( 394365859\t136153151, 589404147\t1361045425)) val edgeTupRDD = edgeFullStrRDD.map(x = x.split(\t)) .map(x = (x(0).toLong, x(1).toLong)) val g = Graph.fromEdgeTuples(edgeTupRDD, defaultValue = 123, uniqueEdges = Option(CanonicalRandomVertexCut)) assert(edgeTupRDD.distinct.count() === 2) assert(g.numEdges === 2) assert(g.edges.distinct.count() === 2) } } Ankur
Re: GraphX: .edges.distinct().count() is 10?
I wasn't able to reproduce this with a small test file, but I did change the file parsing to use x(1).toLong instead of x(2).toLong. Did you mean to take the third column rather than the second? If so, would you mind posting a larger sample of the file, or even the whole file if possible? Here's the test that succeeded: test(graph.edges.distinct.count) { withSpark { sc = val edgeFullStrRDD: RDD[String] = sc.parallelize(List( 394365859\t136153151, 589404147\t1361045425)) val edgeTupRDD = edgeFullStrRDD.map(x = x.split(\t)) .map(x = (x(0).toLong, x(1).toLong)) val g = Graph.fromEdgeTuples(edgeTupRDD, defaultValue = 123, uniqueEdges = Option(CanonicalRandomVertexCut)) assert(edgeTupRDD.distinct.count() === 2) assert(g.numEdges === 2) assert(g.edges.distinct.count() === 2) } } Ankur http://www.ankurdave.com/
Re: GraphX: .edges.distinct().count() is 10?
Try this: https://www.dropbox.com/s/xf34l0ta496bdsn/.txt This code: println(g.numEdges) println(g.numVertices) println(g.edges.distinct().count()) gave me 1 9294 2 On Tue, Apr 22, 2014 at 5:14 PM, Ankur Dave ankurd...@gmail.com wrote: I wasn't able to reproduce this with a small test file, but I did change the file parsing to use x(1).toLong instead of x(2).toLong. Did you mean to take the third column rather than the second? If so, would you mind posting a larger sample of the file, or even the whole file if possible? Here's the test that succeeded: test(graph.edges.distinct.count) { withSpark { sc = val edgeFullStrRDD: RDD[String] = sc.parallelize(List( 394365859\t136153151, 589404147\t1361045425)) val edgeTupRDD = edgeFullStrRDD.map(x = x.split(\t)) .map(x = (x(0).toLong, x(1).toLong)) val g = Graph.fromEdgeTuples(edgeTupRDD, defaultValue = 123, uniqueEdges = Option(CanonicalRandomVertexCut)) assert(edgeTupRDD.distinct.count() === 2) assert(g.numEdges === 2) assert(g.edges.distinct.count() === 2) } } Ankur