I have increased the "spark.storage.memoryFraction" to 0.4 but I still get OOM errors on Spark Executor nodes
15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block broadcast_5_piece10 15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5 took 2704 ms 15/03/27 23:19:52 INFO MemoryStore: ensureFreeSpace(672530208) called with curMem=2484698683, maxMem=9631778734 15/03/27 23:19:52 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 641.4 MB, free 6.0 GB) 15/03/27 23:34:02 WARN AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:407) 15/03/27 23:34:02 ERROR Executor: Exception in task 7.0 in stage 2.0 (TID 4007) java.lang.OutOfMemoryError: GC overhead limit exceeded at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) Thanks Ankur On Fri, Mar 27, 2015 at 2:52 PM, Ankur Srivastava < ankur.srivast...@gmail.com> wrote: > Hi All, > > I am running a spark cluster on EC2 instances of type: m3.2xlarge. I have > given 26gb of memory with all 8 cores to my executors. I can see that in > the logs too: > > *15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added: > app-20150327213106-0000/0 on worker-20150327212934-10.x.y.z-40128 > (10.x.y.z:40128) with 8 cores* > > I am not caching any RDD so I have set "spark.storage.memoryFraction" to > 0.2. I can see on SparkUI under executors tab Memory used is 0.0/4.5 GB. > > I am now confused with these logs? > > *15/03/27 21:31:08 INFO BlockManagerMasterActor: Registering block manager > 10.77.100.196:58407 <http://10.77.100.196:58407> with 4.5 GB RAM, > BlockManagerId(4, 10.x.y.z, 58407)* > > I am broadcasting a large object of 3 gb and after that when I am creating > an RDD, I see logs which show this 4.5 GB memory getting full and then I > get OOM. > > How can I make block manager use more memory? > > Is there any other fine tuning I need to do for broadcasting large objects? > > And does broadcast variable use cache memory or rest of the heap? > > > Thanks > > Ankur >