Hi guys,

I am just getting started with the cloudsuite benchmark. I am having some
trouble running the in-memory-analytics benchmark. I get a
'java.lang.OutOfMemoryError: Java heap space' exception. Any suggestions on
how I change things to make this work? How much heap space memory is
sufficient? Also, will changes be required in the container or the host?
Thanks!

sudo docker run --rm --volumes-from data cloudsuite/in-memory-analytics
/data/ml-latest /data/myratings.csv
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
16/02/29 19:37:58 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
16/02/29 19:37:59 INFO Slf4jLogger: Slf4jLogger started
16/02/29 19:37:59 INFO Remoting: Starting remoting
16/02/29 19:37:59 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://[email protected]:50305]
16/02/29 19:37:59 WARN MetricsSystem: Using default name DAGScheduler for
source because spark.app.id is not set.
16/02/29 19:38:00 INFO FileInputFormat: Total input paths to process : 1
16/02/29 19:38:00 INFO deprecation: mapred.tip.id is deprecated. Instead,
use mapreduce.task.id
16/02/29 19:38:00 INFO deprecation: mapred.task.id is deprecated. Instead,
use mapreduce.task.attempt.id
16/02/29 19:38:00 INFO deprecation: mapred.task.is.map is deprecated.
Instead, use mapreduce.task.ismap
16/02/29 19:38:00 INFO deprecation: mapred.task.partition is deprecated.
Instead, use mapreduce.task.partition
16/02/29 19:38:00 INFO deprecation: mapred.job.id is deprecated. Instead,
use mapreduce.job.id
16/02/29 19:38:01 INFO FileInputFormat: Total input paths to process : 1
Got 22884377 ratings from 247753 users on 33670 movies.
[Stage 9:=============================>                             (2 + 2)
/ 4]16/02/29 19:38:25 WARN MemoryStore: Not enough space to cache rdd_23_2
in memory! (computed 99.2 MB so far)
[Stage 11:>                                                         (0 + 4)
/ 4]16/02/29 19:38:33 WARN MemoryStore: Not enough space to cache rdd_29_2
in memory! (computed 43.2 MB so far)
16/02/29 19:38:33 WARN MemoryStore: Not enough space to cache rdd_29_3 in
memory! (computed 43.2 MB so far)
[Stage 11:==============>                                           (1 + 3)
/ 4]16/02/29 19:38:33 WARN MemoryStore: Not enough space to cache rdd_29_1
in memory! (computed 43.2 MB so far)
[Stage 12:>                                                       (0 + 16)
/ 19]16/02/29 19:38:36 WARN MemoryStore: Not enough space to cache rdd_31_9
in memory! (computed 6.2 MB so far)
16/02/29 19:38:36 WARN MemoryStore: Not enough space to cache rdd_31_12 in
memory! (computed 6.2 MB so far)
16/02/29 19:38:37 WARN MemoryStore: Not enough space to cache rdd_31_10 in
memory! (computed 6.2 MB so far)
16/02/29 19:38:37 WARN MemoryStore: Not enough space to cache rdd_31_6 in
memory! (computed 6.2 MB so far)
Training: 13731669, validation: 4574414, test: 4578305
[Stage 14:===========================================>              (3 + 1)
/ 4]16/02/29 19:38:42 WARN MemoryStore: Not enough space to cache rdd_23_2
in memory! (computed 99.2 MB so far)
[Stage 15:>                                                         (0 + 4)
/ 4]16/02/29 19:38:45 WARN MemoryStore: Not enough space to cache rdd_35_3
in memory! (computed 19.8 MB so far)
16/02/29 19:38:45 WARN CacheManager: Persisting partition rdd_35_3 to disk
instead.
[Stage 16:>                                                       (0 + 16)
/ 16]16/02/29 19:38:56

*ERROR Executor: Exception in task 6.0 in stage 16.0 (TID 179)*
java.lang.OutOfMemoryError: Java heap space
at
scala.collection.mutable.ArrayBuilder$ofInt.mkArray(ArrayBuilder.scala:320)
at
scala.collection.mutable.ArrayBuilder$ofInt.resize(ArrayBuilder.scala:326)
at
scala.collection.mutable.ArrayBuilder$ofInt.ensureSize(ArrayBuilder.scala:338)
at
scala.collection.mutable.ArrayBuilder$ofInt.$plus$eq(ArrayBuilder.scala:343)
at
scala.collection.mutable.ArrayBuilder$ofInt.$plus$eq(ArrayBuilder.scala:313)
at
org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.add(ALS.scala:851)
at
org.apache.spark.ml.recommendation.ALS$$anonfun$15$$anonfun$apply$11.apply(ALS.scala:1066)
at
org.apache.spark.ml.recommendation.ALS$$anonfun$15$$anonfun$apply$11.apply(ALS.scala:1065)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at
org.apache.spark.util.collection.CompactBuffer$$anon$1.foreach(CompactBuffer.scala:115)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at
org.apache.spark.util.collection.CompactBuffer.foreach(CompactBuffer.scala:30)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1065)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/02/29 19:39:01 ERROR Executor: Exception in task 3.0 in stage 16.0 (TID
176)
java.lang.OutOfMemoryError: Java heap space
at
scala.collection.mutable.ArrayBuilder$ofInt.mkArray(ArrayBuilder.scala:320)
at
scala.collection.mutable.ArrayBuilder$ofInt.result(ArrayBuilder.scala:365)
at
scala.collection.mutable.ArrayBuilder$ofInt.result(ArrayBuilder.scala:313)
at
org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/02/29 19:39:00 ERROR Executor: Exception in task 1.0 in stage 16.0 (TID
174)
java.lang.OutOfMemoryError: Java heap space
at
scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441)
at
org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/02/29 19:39:00 ERROR Executor: Exception in task 7.0 in stage 16.0 (TID
180)
java.lang.OutOfMemoryError: Java heap space
at
scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441)
at
org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/02/29 19:39:01 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[Executor task launch worker-3,5,main]
java.lang.OutOfMemoryError: Java heap space
at
scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441)
at
org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/02/29 19:39:01 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[Executor task launch worker-11,5,main]
java.lang.OutOfMemoryError: Java heap space
at
scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441)
at
org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/02/29 19:39:01 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[Executor task launch worker-4,5,main]
java.lang.OutOfMemoryError: Java heap space
at
scala.collection.mutable.ArrayBuilder$ofInt.mkArray(ArrayBuilder.scala:320)
at
scala.collection.mutable.ArrayBuilder$ofInt.resize(ArrayBuilder.scala:326)
at
scala.collection.mutable.ArrayBuilder$ofInt.ensureSize(ArrayBuilder.scala:338)
at
scala.collection.mutable.ArrayBuilder$ofInt.$plus$eq(ArrayBuilder.scala:343)
at
scala.collection.mutable.ArrayBuilder$ofInt.$plus$eq(ArrayBuilder.scala:313)
at
org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.add(ALS.scala:851)
at
org.apache.spark.ml.recommendation.ALS$$anonfun$15$$anonfun$apply$11.apply(ALS.scala:1066)
at
org.apache.spark.ml.recommendation.ALS$$anonfun$15$$anonfun$apply$11.apply(ALS.scala:1065)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at
org.apache.spark.util.collection.CompactBuffer$$anon$1.foreach(CompactBuffer.scala:115)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at
org.apache.spark.util.collection.CompactBuffer.foreach(CompactBuffer.scala:30)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1065)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/02/29 19:39:01 ERROR SparkUncaughtExceptionHandler: Uncaught exception
in thread Thread[Executor task launch worker-1,5,main]
java.lang.OutOfMemoryError: Java heap space
at
scala.collection.mutable.ArrayBuilder$ofInt.mkArray(ArrayBuilder.scala:320)
at
scala.collection.mutable.ArrayBuilder$ofInt.result(ArrayBuilder.scala:365)
at
scala.collection.mutable.ArrayBuilder$ofInt.result(ArrayBuilder.scala:313)
at
org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/02/29 19:39:02 WARN TaskSetManager: Lost task 7.0 in stage 16.0 (TID
180, localhost): java.lang.OutOfMemoryError: Java heap space
at
scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441)
at
org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

16/02/29 19:39:03 ERROR TaskSetManager: Task 7 in stage 16.0 failed 1
times; aborting job
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 7 in stage 16.0 failed 1 times, most recent failure:
Lost task 7.0 in stage 16.0 (TID 180, localhost):
java.lang.OutOfMemoryError: Java heap space
at
scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441)
at
org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1835)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1848)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1919)
at org.apache.spark.rdd.RDD.count(RDD.scala:1121)
at org.apache.spark.ml.recommendation.ALS$.train(ALS.scala:550)
at org.apache.spark.mllib.recommendation.ALS.run(ALS.scala:239)
at org.apache.spark.mllib.recommendation.ALS$.train(ALS.scala:328)
at org.apache.spark.mllib.recommendation.ALS$.train(ALS.scala:346)
at
MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1$$anonfun$apply$mcVD$sp$1.apply$mcVI$sp(MovieLensALS.scala:101)
at
MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1$$anonfun$apply$mcVD$sp$1.apply(MovieLensALS.scala:100)
at
MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1$$anonfun$apply$mcVD$sp$1.apply(MovieLensALS.scala:100)
at scala.collection.immutable.List.foreach(List.scala:318)
at
MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1.apply$mcVD$sp(MovieLensALS.scala:100)
at
MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1.apply(MovieLensALS.scala:100)
at
MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1.apply(MovieLensALS.scala:100)
at scala.collection.immutable.List.foreach(List.scala:318)
at MovieLensALS$$anonfun$main$1.apply$mcVI$sp(MovieLensALS.scala:100)
at MovieLensALS$$anonfun$main$1.apply(MovieLensALS.scala:100)
at MovieLensALS$$anonfun$main$1.apply(MovieLensALS.scala:100)
at scala.collection.immutable.List.foreach(List.scala:318)
at MovieLensALS$.main(MovieLensALS.scala:100)
at MovieLensALS.main(MovieLensALS.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.OutOfMemoryError: Java heap space
at
scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493)
at
scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441)
at
org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068)
at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:262)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/02/29 19:39:03 WARN QueuedThreadPool: 10 threads could not be stopped
16/02/29 19:39:04 WARN QueuedThreadPool: 7 threads could not be stopped

Reply via email to