: Samstag, 11. Juli 2015 03:58
An: Ted Yu; Robin East; user
Betreff: Re: Spark GraphX memory requirements + java.lang.OutOfMemoryError: GC
overhead limit exceeded
Hello again.
So I could compute triangle numbers when run the code from spark shell without
workers (with --driver-memory 15g option
Hello again.
So I could compute triangle numbers when run the code from spark shell
without workers (with --driver-memory 15g option), but with workers I have
errors. So I run spark shell:
./bin/spark-shell --master spark://192.168.0.31:7077 --executor-memory
6900m --driver-memory 15g
and workers
Ok, but what does it means? I did not change the core files of spark, so is
it a bug there?
PS: on small datasets (500 Mb) I have no problem.
Am 25.06.2015 18:02 schrieb Ted Yu yuzhih...@gmail.com:
The assertion failure from TriangleCount.scala corresponds with the
following lines:
You’ll get this issue if you just take the first 2000 lines of that file. The
problem is triangleCount() expects srdId dstId which is not the case in the
file (e.g. vertex 28). You can get round this by calling
graph.convertToCanonical Edges() which removes bi-directional edges and ensures
Yep, I already found it. So I added 1 line:
val graph = GraphLoader.edgeListFile(sc, , ...)
val newgraph = graph.convertToCanonicalEdges()
and could successfully count triangles on newgraph. Next will test it on
bigger (several Gb) networks.
I am using Spark 1.3 and 1.4 but haven't seen
Hello!
I am trying to compute number of triangles with GraphX. But get memory
error or heap size, even though the dataset is very small (1Gb). I run the
code in spark-shell, having 16Gb RAM machine (also tried with 2 workers on
separate machines 8Gb RAM each). So I have 15x more memory than the
The assertion failure from TriangleCount.scala corresponds with the
following lines:
g.outerJoinVertices(counters) {
(vid, _, optCounter: Option[Int]) =
val dblCount = optCounter.getOrElse(0)
// double count should be even (divisible by two)
assert((dblCount 1)