Hi guys, I am just getting started with the cloudsuite benchmark. I am having some trouble running the in-memory-analytics benchmark. I get a 'java.lang.OutOfMemoryError: Java heap space' exception. Any suggestions on how I change things to make this work? How much heap space memory is sufficient? Also, will changes be required in the container or the host? Thanks!
sudo docker run --rm --volumes-from data cloudsuite/in-memory-analytics /data/ml-latest /data/myratings.csv Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/02/29 19:37:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/02/29 19:37:59 INFO Slf4jLogger: Slf4jLogger started 16/02/29 19:37:59 INFO Remoting: Starting remoting 16/02/29 19:37:59 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:50305] 16/02/29 19:37:59 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 16/02/29 19:38:00 INFO FileInputFormat: Total input paths to process : 1 16/02/29 19:38:00 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 16/02/29 19:38:00 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 16/02/29 19:38:00 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 16/02/29 19:38:00 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 16/02/29 19:38:00 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 16/02/29 19:38:01 INFO FileInputFormat: Total input paths to process : 1 Got 22884377 ratings from 247753 users on 33670 movies. [Stage 9:=============================> (2 + 2) / 4]16/02/29 19:38:25 WARN MemoryStore: Not enough space to cache rdd_23_2 in memory! (computed 99.2 MB so far) [Stage 11:> (0 + 4) / 4]16/02/29 19:38:33 WARN MemoryStore: Not enough space to cache rdd_29_2 in memory! (computed 43.2 MB so far) 16/02/29 19:38:33 WARN MemoryStore: Not enough space to cache rdd_29_3 in memory! (computed 43.2 MB so far) [Stage 11:==============> (1 + 3) / 4]16/02/29 19:38:33 WARN MemoryStore: Not enough space to cache rdd_29_1 in memory! (computed 43.2 MB so far) [Stage 12:> (0 + 16) / 19]16/02/29 19:38:36 WARN MemoryStore: Not enough space to cache rdd_31_9 in memory! (computed 6.2 MB so far) 16/02/29 19:38:36 WARN MemoryStore: Not enough space to cache rdd_31_12 in memory! (computed 6.2 MB so far) 16/02/29 19:38:37 WARN MemoryStore: Not enough space to cache rdd_31_10 in memory! (computed 6.2 MB so far) 16/02/29 19:38:37 WARN MemoryStore: Not enough space to cache rdd_31_6 in memory! (computed 6.2 MB so far) Training: 13731669, validation: 4574414, test: 4578305 [Stage 14:===========================================> (3 + 1) / 4]16/02/29 19:38:42 WARN MemoryStore: Not enough space to cache rdd_23_2 in memory! (computed 99.2 MB so far) [Stage 15:> (0 + 4) / 4]16/02/29 19:38:45 WARN MemoryStore: Not enough space to cache rdd_35_3 in memory! (computed 19.8 MB so far) 16/02/29 19:38:45 WARN CacheManager: Persisting partition rdd_35_3 to disk instead. [Stage 16:> (0 + 16) / 16]16/02/29 19:38:56 *ERROR Executor: Exception in task 6.0 in stage 16.0 (TID 179)* java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ArrayBuilder$ofInt.mkArray(ArrayBuilder.scala:320) at scala.collection.mutable.ArrayBuilder$ofInt.resize(ArrayBuilder.scala:326) at scala.collection.mutable.ArrayBuilder$ofInt.ensureSize(ArrayBuilder.scala:338) at scala.collection.mutable.ArrayBuilder$ofInt.$plus$eq(ArrayBuilder.scala:343) at scala.collection.mutable.ArrayBuilder$ofInt.$plus$eq(ArrayBuilder.scala:313) at org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.add(ALS.scala:851) at org.apache.spark.ml.recommendation.ALS$$anonfun$15$$anonfun$apply$11.apply(ALS.scala:1066) at org.apache.spark.ml.recommendation.ALS$$anonfun$15$$anonfun$apply$11.apply(ALS.scala:1065) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.util.collection.CompactBuffer$$anon$1.foreach(CompactBuffer.scala:115) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at org.apache.spark.util.collection.CompactBuffer.foreach(CompactBuffer.scala:30) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1065) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/02/29 19:39:01 ERROR Executor: Exception in task 3.0 in stage 16.0 (TID 176) java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ArrayBuilder$ofInt.mkArray(ArrayBuilder.scala:320) at scala.collection.mutable.ArrayBuilder$ofInt.result(ArrayBuilder.scala:365) at scala.collection.mutable.ArrayBuilder$ofInt.result(ArrayBuilder.scala:313) at org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/02/29 19:39:00 ERROR Executor: Exception in task 1.0 in stage 16.0 (TID 174) java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) at org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/02/29 19:39:00 ERROR Executor: Exception in task 7.0 in stage 16.0 (TID 180) java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) at org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/02/29 19:39:01 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-3,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) at org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/02/29 19:39:01 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-11,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) at org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/02/29 19:39:01 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-4,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ArrayBuilder$ofInt.mkArray(ArrayBuilder.scala:320) at scala.collection.mutable.ArrayBuilder$ofInt.resize(ArrayBuilder.scala:326) at scala.collection.mutable.ArrayBuilder$ofInt.ensureSize(ArrayBuilder.scala:338) at scala.collection.mutable.ArrayBuilder$ofInt.$plus$eq(ArrayBuilder.scala:343) at scala.collection.mutable.ArrayBuilder$ofInt.$plus$eq(ArrayBuilder.scala:313) at org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.add(ALS.scala:851) at org.apache.spark.ml.recommendation.ALS$$anonfun$15$$anonfun$apply$11.apply(ALS.scala:1066) at org.apache.spark.ml.recommendation.ALS$$anonfun$15$$anonfun$apply$11.apply(ALS.scala:1065) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.util.collection.CompactBuffer$$anon$1.foreach(CompactBuffer.scala:115) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at org.apache.spark.util.collection.CompactBuffer.foreach(CompactBuffer.scala:30) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1065) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/02/29 19:39:01 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-1,5,main] java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ArrayBuilder$ofInt.mkArray(ArrayBuilder.scala:320) at scala.collection.mutable.ArrayBuilder$ofInt.result(ArrayBuilder.scala:365) at scala.collection.mutable.ArrayBuilder$ofInt.result(ArrayBuilder.scala:313) at org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/02/29 19:39:02 WARN TaskSetManager: Lost task 7.0 in stage 16.0 (TID 180, localhost): java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) at org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/02/29 19:39:03 ERROR TaskSetManager: Task 7 in stage 16.0 failed 1 times; aborting job Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 16.0 failed 1 times, most recent failure: Lost task 7.0 in stage 16.0 (TID 180, localhost): java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) at org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1835) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1848) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1919) at org.apache.spark.rdd.RDD.count(RDD.scala:1121) at org.apache.spark.ml.recommendation.ALS$.train(ALS.scala:550) at org.apache.spark.mllib.recommendation.ALS.run(ALS.scala:239) at org.apache.spark.mllib.recommendation.ALS$.train(ALS.scala:328) at org.apache.spark.mllib.recommendation.ALS$.train(ALS.scala:346) at MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1$$anonfun$apply$mcVD$sp$1.apply$mcVI$sp(MovieLensALS.scala:101) at MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1$$anonfun$apply$mcVD$sp$1.apply(MovieLensALS.scala:100) at MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1$$anonfun$apply$mcVD$sp$1.apply(MovieLensALS.scala:100) at scala.collection.immutable.List.foreach(List.scala:318) at MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1.apply$mcVD$sp(MovieLensALS.scala:100) at MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1.apply(MovieLensALS.scala:100) at MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1.apply(MovieLensALS.scala:100) at scala.collection.immutable.List.foreach(List.scala:318) at MovieLensALS$$anonfun$main$1.apply$mcVI$sp(MovieLensALS.scala:100) at MovieLensALS$$anonfun$main$1.apply(MovieLensALS.scala:100) at MovieLensALS$$anonfun$main$1.apply(MovieLensALS.scala:100) at scala.collection.immutable.List.foreach(List.scala:318) at MovieLensALS$.main(MovieLensALS.scala:100) at MovieLensALS.main(MovieLensALS.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.OutOfMemoryError: Java heap space at scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) at scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) at org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/02/29 19:39:03 WARN QueuedThreadPool: 10 threads could not be stopped 16/02/29 19:39:04 WARN QueuedThreadPool: 7 threads could not be stopped
