Hi Wisely, I am running on Amazon EC2 instances so I can not doubt the hardware. Moreover my other pipelines run successfully except for this which involves Broadcasting large object.
My spark-en.sh setting are: SPARK_MASTER_IP=<MASTER-IP> SPARK_LOCAL_IP=<LOCAL-IP> SPARK_DRIVER_MEMORY=24g SPARK_WORKER_MEMORY=28g SPARK_EXECUTOR_MEMORY=26g SPARK_WORKER_CORES=8 My spark-default.sh settings are: spark.eventLog.enabled true spark.eventLog.dir /srv/logs/ spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryo.registrator com.test.utils.KryoSerializationRegistrator spark.executor.extraJavaOptions "-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/srv/logs/ -XX:+UseG1GC" spark.shuffle.consolidateFiles true spark.shuffle.manager sort spark.shuffle.compress true spark.rdd.compress true Thanks Ankur On Sat, Mar 28, 2015 at 7:57 AM, Wisely Chen <wiselyc...@appier.com> wrote: > Hi Ankur > > If your hardware is ok, looks like it is config problem. Can you show me > the config of spark-env.sh or JVM config? > > Thanks > > Wisely Chen > > 2015-03-28 15:39 GMT+08:00 Ankur Srivastava <ankur.srivast...@gmail.com>: > >> Hi Wisely, >> I have 26gb for driver and the master is running on m3.2xlarge machines. >> >> I see OOM errors on workers and even they are running with 26th of memory. >> >> Thanks >> >> On Fri, Mar 27, 2015, 11:43 PM Wisely Chen <wiselyc...@appier.com> wrote: >> >>> Hi >>> >>> In broadcast, spark will collect the whole 3gb object into master node >>> and broadcast to each slaves. It is very common situation that the master >>> node don't have enough memory . >>> >>> What is your master node settings? >>> >>> Wisely Chen >>> >>> Ankur Srivastava <ankur.srivast...@gmail.com> 於 2015年3月28日 星期六寫道: >>> >>> I have increased the "spark.storage.memoryFraction" to 0.4 but I still >>>> get OOM errors on Spark Executor nodes >>>> >>>> >>>> 15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block >>>> broadcast_5_piece10 >>>> >>>> 15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5 >>>> took 2704 ms >>>> >>>> 15/03/27 23:19:52 INFO MemoryStore: ensureFreeSpace(672530208) called >>>> with curMem=2484698683, maxMem=9631778734 >>>> >>>> 15/03/27 23:19:52 INFO MemoryStore: Block broadcast_5 stored as values >>>> in memory (estimated size 641.4 MB, free 6.0 GB) >>>> >>>> 15/03/27 23:34:02 WARN AkkaUtils: Error sending message in 1 attempts >>>> >>>> java.util.concurrent.TimeoutException: Futures timed out after [30 >>>> seconds] >>>> >>>> at >>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) >>>> >>>> at >>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) >>>> >>>> at >>>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) >>>> >>>> at >>>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) >>>> >>>> at scala.concurrent.Await$.result(package.scala:107) >>>> >>>> at >>>> org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187) >>>> >>>> at >>>> org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:407) >>>> >>>> 15/03/27 23:34:02 ERROR Executor: Exception in task 7.0 in stage 2.0 >>>> (TID 4007) >>>> >>>> java.lang.OutOfMemoryError: GC overhead limit exceeded >>>> >>>> at >>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986) >>>> >>>> at >>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>>> >>>> at >>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>>> >>>> Thanks >>>> >>>> Ankur >>>> >>>> On Fri, Mar 27, 2015 at 2:52 PM, Ankur Srivastava < >>>> ankur.srivast...@gmail.com> wrote: >>>> >>>>> Hi All, >>>>> >>>>> I am running a spark cluster on EC2 instances of type: m3.2xlarge. I >>>>> have given 26gb of memory with all 8 cores to my executors. I can see that >>>>> in the logs too: >>>>> >>>>> *15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added: >>>>> app-20150327213106-0000/0 on worker-20150327212934-10.x.y.z-40128 >>>>> (10.x.y.z:40128) with 8 cores* >>>>> >>>>> I am not caching any RDD so I have set "spark.storage.memoryFraction" >>>>> to 0.2. I can see on SparkUI under executors tab Memory used is 0.0/4.5 >>>>> GB. >>>>> >>>>> I am now confused with these logs? >>>>> >>>>> *15/03/27 21:31:08 INFO BlockManagerMasterActor: Registering block >>>>> manager 10.77.100.196:58407 <http://10.77.100.196:58407> with 4.5 GB RAM, >>>>> BlockManagerId(4, 10.x.y.z, 58407)* >>>>> >>>>> I am broadcasting a large object of 3 gb and after that when I am >>>>> creating an RDD, I see logs which show this 4.5 GB memory getting full and >>>>> then I get OOM. >>>>> >>>>> How can I make block manager use more memory? >>>>> >>>>> Is there any other fine tuning I need to do for broadcasting large >>>>> objects? >>>>> >>>>> And does broadcast variable use cache memory or rest of the heap? >>>>> >>>>> >>>>> Thanks >>>>> >>>>> Ankur >>>>> >>>> >>>> >