Hi,

I have a Java memory issue with Spark. The same application working on my 8GB 
Mac crashes on my 72GB Ubuntu server...

I have changed things in the conf file, but it looks like Spark does not care, 
so I wonder if my issues are with the driver or executor.

I set:

spark.driver.memory             20g
spark.executor.memory           20g
And, whatever I do, the crash is always at the same spot in the app, which 
makes me think that it is a driver problem.

The exception I get is:

16/07/13 20:36:30 WARN TaskSetManager: Lost task 0.0 in stage 7.0 (TID 208, 
micha.nc.rr.com): java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
    at java.nio.CharBuffer.allocate(CharBuffer.java:335)
    at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:810)
    at org.apache.hadoop.io.Text.decode(Text.java:412)
    at org.apache.hadoop.io.Text.decode(Text.java:389)
    at org.apache.hadoop.io.Text.toString(Text.java:280)
    at 
org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
    at 
org.apache.spark.sql.execution.datasources.json.JSONRelation$$anonfun$org$apache$spark$sql$execution$datasources$json$JSONRelation$$createBaseRdd$1.apply(JSONRelation.scala:105)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at 
scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
    at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
    at 
scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
    at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
    at 
org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
    at 
org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$23.apply(RDD.scala:1135)
    at 
org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
    at 
org.apache.spark.rdd.RDD$$anonfun$treeAggregate$1$$anonfun$24.apply(RDD.scala:1136)
    at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
    at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

I have set a small memory "dumper" in my app. At the beginning, it says:

**  Free ......... 1,413,566
**  Allocated .... 1,705,984
**  Max .......... 16,495,104
**> Total free ... 16,202,686
Just before the crash, it says:

**  Free ......... 1,461,633
**  Allocated .... 1,786,880
**  Max .......... 16,495,104
**> Total free ... 16,169,857




Reply via email to