Sorry about this. This was fixed by increasing the allocated memory as mentioned in the `tweaking the benchmark` section. Thanks
On Mon, Feb 29, 2016 at 1:55 PM, Mohammad Ahmad <[email protected]> wrote: > Hi guys, > > I am just getting started with the cloudsuite benchmark. I am having some > trouble running the in-memory-analytics benchmark. I get a > 'java.lang.OutOfMemoryError: Java heap space' exception. Any suggestions on > how I change things to make this work? How much heap space memory is > sufficient? Also, will changes be required in the container or the host? > Thanks! > > sudo docker run --rm --volumes-from data cloudsuite/in-memory-analytics > /data/ml-latest /data/myratings.csv > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > 16/02/29 19:37:58 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 16/02/29 19:37:59 INFO Slf4jLogger: Slf4jLogger started > 16/02/29 19:37:59 INFO Remoting: Starting remoting > 16/02/29 19:37:59 INFO Remoting: Remoting started; listening on addresses > :[akka.tcp://[email protected]:50305] > 16/02/29 19:37:59 WARN MetricsSystem: Using default name DAGScheduler for > source because spark.app.id is not set. > 16/02/29 19:38:00 INFO FileInputFormat: Total input paths to process : 1 > 16/02/29 19:38:00 INFO deprecation: mapred.tip.id is deprecated. Instead, > use mapreduce.task.id > 16/02/29 19:38:00 INFO deprecation: mapred.task.id is deprecated. > Instead, use mapreduce.task.attempt.id > 16/02/29 19:38:00 INFO deprecation: mapred.task.is.map is deprecated. > Instead, use mapreduce.task.ismap > 16/02/29 19:38:00 INFO deprecation: mapred.task.partition is deprecated. > Instead, use mapreduce.task.partition > 16/02/29 19:38:00 INFO deprecation: mapred.job.id is deprecated. Instead, > use mapreduce.job.id > 16/02/29 19:38:01 INFO FileInputFormat: Total input paths to process : 1 > Got 22884377 ratings from 247753 users on 33670 movies. > [Stage 9:=============================> (2 + > 2) / 4]16/02/29 19:38:25 WARN MemoryStore: Not enough space to cache > rdd_23_2 in memory! (computed 99.2 MB so far) > [Stage 11:> (0 + > 4) / 4]16/02/29 19:38:33 WARN MemoryStore: Not enough space to cache > rdd_29_2 in memory! (computed 43.2 MB so far) > 16/02/29 19:38:33 WARN MemoryStore: Not enough space to cache rdd_29_3 in > memory! (computed 43.2 MB so far) > [Stage 11:==============> (1 + > 3) / 4]16/02/29 19:38:33 WARN MemoryStore: Not enough space to cache > rdd_29_1 in memory! (computed 43.2 MB so far) > [Stage 12:> (0 + 16) > / 19]16/02/29 19:38:36 WARN MemoryStore: Not enough space to cache rdd_31_9 > in memory! (computed 6.2 MB so far) > 16/02/29 19:38:36 WARN MemoryStore: Not enough space to cache rdd_31_12 in > memory! (computed 6.2 MB so far) > 16/02/29 19:38:37 WARN MemoryStore: Not enough space to cache rdd_31_10 in > memory! (computed 6.2 MB so far) > 16/02/29 19:38:37 WARN MemoryStore: Not enough space to cache rdd_31_6 in > memory! (computed 6.2 MB so far) > Training: 13731669, validation: 4574414, test: 4578305 > [Stage 14:===========================================> (3 + > 1) / 4]16/02/29 19:38:42 WARN MemoryStore: Not enough space to cache > rdd_23_2 in memory! (computed 99.2 MB so far) > [Stage 15:> (0 + > 4) / 4]16/02/29 19:38:45 WARN MemoryStore: Not enough space to cache > rdd_35_3 in memory! (computed 19.8 MB so far) > 16/02/29 19:38:45 WARN CacheManager: Persisting partition rdd_35_3 to disk > instead. > [Stage 16:> (0 + 16) > / 16]16/02/29 19:38:56 > > *ERROR Executor: Exception in task 6.0 in stage 16.0 (TID 179)* > java.lang.OutOfMemoryError: Java heap space > at > scala.collection.mutable.ArrayBuilder$ofInt.mkArray(ArrayBuilder.scala:320) > at > scala.collection.mutable.ArrayBuilder$ofInt.resize(ArrayBuilder.scala:326) > at > scala.collection.mutable.ArrayBuilder$ofInt.ensureSize(ArrayBuilder.scala:338) > at > scala.collection.mutable.ArrayBuilder$ofInt.$plus$eq(ArrayBuilder.scala:343) > at > scala.collection.mutable.ArrayBuilder$ofInt.$plus$eq(ArrayBuilder.scala:313) > at > org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.add(ALS.scala:851) > at > org.apache.spark.ml.recommendation.ALS$$anonfun$15$$anonfun$apply$11.apply(ALS.scala:1066) > at > org.apache.spark.ml.recommendation.ALS$$anonfun$15$$anonfun$apply$11.apply(ALS.scala:1065) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at > org.apache.spark.util.collection.CompactBuffer$$anon$1.foreach(CompactBuffer.scala:115) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at > org.apache.spark.util.collection.CompactBuffer.foreach(CompactBuffer.scala:30) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1065) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 16/02/29 19:39:01 ERROR Executor: Exception in task 3.0 in stage 16.0 (TID > 176) > java.lang.OutOfMemoryError: Java heap space > at > scala.collection.mutable.ArrayBuilder$ofInt.mkArray(ArrayBuilder.scala:320) > at > scala.collection.mutable.ArrayBuilder$ofInt.result(ArrayBuilder.scala:365) > at > scala.collection.mutable.ArrayBuilder$ofInt.result(ArrayBuilder.scala:313) > at > org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 16/02/29 19:39:00 ERROR Executor: Exception in task 1.0 in stage 16.0 (TID > 174) > java.lang.OutOfMemoryError: Java heap space > at > scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) > at > org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 16/02/29 19:39:00 ERROR Executor: Exception in task 7.0 in stage 16.0 (TID > 180) > java.lang.OutOfMemoryError: Java heap space > at > scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) > at > org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 16/02/29 19:39:01 ERROR SparkUncaughtExceptionHandler: Uncaught exception > in thread Thread[Executor task launch worker-3,5,main] > java.lang.OutOfMemoryError: Java heap space > at > scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) > at > org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 16/02/29 19:39:01 ERROR SparkUncaughtExceptionHandler: Uncaught exception > in thread Thread[Executor task launch worker-11,5,main] > java.lang.OutOfMemoryError: Java heap space > at > scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) > at > org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 16/02/29 19:39:01 ERROR SparkUncaughtExceptionHandler: Uncaught exception > in thread Thread[Executor task launch worker-4,5,main] > java.lang.OutOfMemoryError: Java heap space > at > scala.collection.mutable.ArrayBuilder$ofInt.mkArray(ArrayBuilder.scala:320) > at > scala.collection.mutable.ArrayBuilder$ofInt.resize(ArrayBuilder.scala:326) > at > scala.collection.mutable.ArrayBuilder$ofInt.ensureSize(ArrayBuilder.scala:338) > at > scala.collection.mutable.ArrayBuilder$ofInt.$plus$eq(ArrayBuilder.scala:343) > at > scala.collection.mutable.ArrayBuilder$ofInt.$plus$eq(ArrayBuilder.scala:313) > at > org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.add(ALS.scala:851) > at > org.apache.spark.ml.recommendation.ALS$$anonfun$15$$anonfun$apply$11.apply(ALS.scala:1066) > at > org.apache.spark.ml.recommendation.ALS$$anonfun$15$$anonfun$apply$11.apply(ALS.scala:1065) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at > org.apache.spark.util.collection.CompactBuffer$$anon$1.foreach(CompactBuffer.scala:115) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at > org.apache.spark.util.collection.CompactBuffer.foreach(CompactBuffer.scala:30) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1065) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 16/02/29 19:39:01 ERROR SparkUncaughtExceptionHandler: Uncaught exception > in thread Thread[Executor task launch worker-1,5,main] > java.lang.OutOfMemoryError: Java heap space > at > scala.collection.mutable.ArrayBuilder$ofInt.mkArray(ArrayBuilder.scala:320) > at > scala.collection.mutable.ArrayBuilder$ofInt.result(ArrayBuilder.scala:365) > at > scala.collection.mutable.ArrayBuilder$ofInt.result(ArrayBuilder.scala:313) > at > org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 16/02/29 19:39:02 WARN TaskSetManager: Lost task 7.0 in stage 16.0 (TID > 180, localhost): java.lang.OutOfMemoryError: Java heap space > at > scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) > at > org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > 16/02/29 19:39:03 ERROR TaskSetManager: Task 7 in stage 16.0 failed 1 > times; aborting job > Exception in thread "main" org.apache.spark.SparkException: Job aborted > due to stage failure: Task 7 in stage 16.0 failed 1 times, most recent > failure: Lost task 7.0 in stage 16.0 (TID 180, localhost): > java.lang.OutOfMemoryError: Java heap space > at > scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) > at > org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > Driver stacktrace: > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1835) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1848) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1919) > at org.apache.spark.rdd.RDD.count(RDD.scala:1121) > at org.apache.spark.ml.recommendation.ALS$.train(ALS.scala:550) > at org.apache.spark.mllib.recommendation.ALS.run(ALS.scala:239) > at org.apache.spark.mllib.recommendation.ALS$.train(ALS.scala:328) > at org.apache.spark.mllib.recommendation.ALS$.train(ALS.scala:346) > at > MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1$$anonfun$apply$mcVD$sp$1.apply$mcVI$sp(MovieLensALS.scala:101) > at > MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1$$anonfun$apply$mcVD$sp$1.apply(MovieLensALS.scala:100) > at > MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1$$anonfun$apply$mcVD$sp$1.apply(MovieLensALS.scala:100) > at scala.collection.immutable.List.foreach(List.scala:318) > at > MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1.apply$mcVD$sp(MovieLensALS.scala:100) > at > MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1.apply(MovieLensALS.scala:100) > at > MovieLensALS$$anonfun$main$1$$anonfun$apply$mcVI$sp$1.apply(MovieLensALS.scala:100) > at scala.collection.immutable.List.foreach(List.scala:318) > at MovieLensALS$$anonfun$main$1.apply$mcVI$sp(MovieLensALS.scala:100) > at MovieLensALS$$anonfun$main$1.apply(MovieLensALS.scala:100) > at MovieLensALS$$anonfun$main$1.apply(MovieLensALS.scala:100) > at scala.collection.immutable.List.foreach(List.scala:318) > at MovieLensALS$.main(MovieLensALS.scala:100) > at MovieLensALS.main(MovieLensALS.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.OutOfMemoryError: Java heap space > at > scala.collection.mutable.ArrayBuilder$ofFloat.mkArray(ArrayBuilder.scala:448) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:493) > at > scala.collection.mutable.ArrayBuilder$ofFloat.result(ArrayBuilder.scala:441) > at > org.apache.spark.ml.recommendation.ALS$UncompressedInBlockBuilder.build(ALS.scala:859) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1068) > at org.apache.spark.ml.recommendation.ALS$$anonfun$15.apply(ALS.scala:1062) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$41$$anonfun$apply$42.apply(PairRDDFunctions.scala:700) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 16/02/29 19:39:03 WARN QueuedThreadPool: 10 threads could not be stopped > 16/02/29 19:39:04 WARN QueuedThreadPool: 7 threads could not be stopped >
