Re: GraphX: .edges.distinct().count() is 10?

2014-04-23 Thread Daniel Darabos
This is caused by https://issues.apache.org/jira/browse/SPARK-1188. I think
the fix will be in the next release. But until then, do:

g.edges.map(_.copy()).distinct.count



On Wed, Apr 23, 2014 at 2:26 AM, Ryan Compton compton.r...@gmail.comwrote:

 Try this: https://www.dropbox.com/s/xf34l0ta496bdsn/.txt

 This code:

 println(g.numEdges)
 println(g.numVertices)
 println(g.edges.distinct().count())

 gave me

 1
 9294
 2



 On Tue, Apr 22, 2014 at 5:14 PM, Ankur Dave ankurd...@gmail.com wrote:
  I wasn't able to reproduce this with a small test file, but I did change
 the
  file parsing to use x(1).toLong instead of x(2).toLong. Did you mean to
 take
  the third column rather than the second?
 
  If so, would you mind posting a larger sample of the file, or even the
 whole
  file if possible?
 
  Here's the test that succeeded:
 
test(graph.edges.distinct.count) {
  withSpark { sc =
val edgeFullStrRDD: RDD[String] = sc.parallelize(List(
  394365859\t136153151, 589404147\t1361045425))
val edgeTupRDD = edgeFullStrRDD.map(x = x.split(\t))
  .map(x = (x(0).toLong, x(1).toLong))
val g = Graph.fromEdgeTuples(edgeTupRDD, defaultValue = 123,
  uniqueEdges = Option(CanonicalRandomVertexCut))
assert(edgeTupRDD.distinct.count() === 2)
assert(g.numEdges === 2)
assert(g.edges.distinct.count() === 2)
  }
}
 
  Ankur



Re: GraphX: .edges.distinct().count() is 10?

2014-04-22 Thread Ankur Dave
I wasn't able to reproduce this with a small test file, but I did change
the file parsing to use x(1).toLong instead of x(2).toLong. Did you mean to
take the third column rather than the second?

If so, would you mind posting a larger sample of the file, or even the
whole file if possible?

Here's the test that succeeded:

  test(graph.edges.distinct.count) {
withSpark { sc =
  val edgeFullStrRDD: RDD[String] = sc.parallelize(List(
394365859\t136153151, 589404147\t1361045425))
  val edgeTupRDD = edgeFullStrRDD.map(x = x.split(\t))
.map(x = (x(0).toLong, x(1).toLong))
  val g = Graph.fromEdgeTuples(edgeTupRDD, defaultValue = 123,
uniqueEdges = Option(CanonicalRandomVertexCut))
  assert(edgeTupRDD.distinct.count() === 2)
  assert(g.numEdges === 2)
  assert(g.edges.distinct.count() === 2)
}
  }

Ankur http://www.ankurdave.com/


Re: GraphX: .edges.distinct().count() is 10?

2014-04-22 Thread Ryan Compton
Try this: https://www.dropbox.com/s/xf34l0ta496bdsn/.txt

This code:

println(g.numEdges)
println(g.numVertices)
println(g.edges.distinct().count())

gave me

1
9294
2



On Tue, Apr 22, 2014 at 5:14 PM, Ankur Dave ankurd...@gmail.com wrote:
 I wasn't able to reproduce this with a small test file, but I did change the
 file parsing to use x(1).toLong instead of x(2).toLong. Did you mean to take
 the third column rather than the second?

 If so, would you mind posting a larger sample of the file, or even the whole
 file if possible?

 Here's the test that succeeded:

   test(graph.edges.distinct.count) {
 withSpark { sc =
   val edgeFullStrRDD: RDD[String] = sc.parallelize(List(
 394365859\t136153151, 589404147\t1361045425))
   val edgeTupRDD = edgeFullStrRDD.map(x = x.split(\t))
 .map(x = (x(0).toLong, x(1).toLong))
   val g = Graph.fromEdgeTuples(edgeTupRDD, defaultValue = 123,
 uniqueEdges = Option(CanonicalRandomVertexCut))
   assert(edgeTupRDD.distinct.count() === 2)
   assert(g.numEdges === 2)
   assert(g.edges.distinct.count() === 2)
 }
   }

 Ankur