Hi Wisely, I am running spark 1.2.1 and I have checked the process heap and it is running with all the heap that I am assigning and as I mentioned earlier I get OOM on workers not the driver or master.
Thanks Ankur On Mon, Mar 30, 2015 at 9:24 AM, giive chen <thegi...@gmail.com> wrote: > Hi Ankur > > If you using standalone mode, your config is wrong. You should use "export > SPARK_DAEMON_MEMORY=xxx " in config/spark-env.sh. At least it works on my > spark 1.3.0 standalone mode machine. > > BTW, The SPARK_DRIVER_MEMORY is used in Yarn mode and looks like the > standalone mode don't use this config. > > To debug this, please type "ps auxw | grep > org.apache.spark.deploy.master.[M]aster" in master machine. > You can see the Xmx and Xms option. > > Wisely Chen > > > > > > > On Mon, Mar 30, 2015 at 3:55 AM, Ankur Srivastava < > ankur.srivast...@gmail.com> wrote: > >> Hi Wisely, >> >> I am running on Amazon EC2 instances so I can not doubt the hardware. >> Moreover my other pipelines run successfully except for this which involves >> Broadcasting large object. >> >> My spark-en.sh setting are: >> >> SPARK_MASTER_IP=<MASTER-IP> >> >> SPARK_LOCAL_IP=<LOCAL-IP> >> >> SPARK_DRIVER_MEMORY=24g >> >> SPARK_WORKER_MEMORY=28g >> >> SPARK_EXECUTOR_MEMORY=26g >> >> SPARK_WORKER_CORES=8 >> >> My spark-default.sh settings are: >> >> spark.eventLog.enabled true >> >> spark.eventLog.dir /srv/logs/ >> >> spark.serializer >> org.apache.spark.serializer.KryoSerializer >> >> spark.kryo.registrator >> com.test.utils.KryoSerializationRegistrator >> >> spark.executor.extraJavaOptions "-verbose:gc -XX:+PrintGCDetails >> -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError >> -XX:HeapDumpPath=/srv/logs/ -XX:+UseG1GC" >> >> spark.shuffle.consolidateFiles true >> >> spark.shuffle.manager sort >> >> spark.shuffle.compress true >> >> spark.rdd.compress true >> Thanks >> Ankur >> >> On Sat, Mar 28, 2015 at 7:57 AM, Wisely Chen <wiselyc...@appier.com> >> wrote: >> >>> Hi Ankur >>> >>> If your hardware is ok, looks like it is config problem. Can you show me >>> the config of spark-env.sh or JVM config? >>> >>> Thanks >>> >>> Wisely Chen >>> >>> 2015-03-28 15:39 GMT+08:00 Ankur Srivastava <ankur.srivast...@gmail.com> >>> : >>> >>>> Hi Wisely, >>>> I have 26gb for driver and the master is running on m3.2xlarge machines. >>>> >>>> I see OOM errors on workers and even they are running with 26th of >>>> memory. >>>> >>>> Thanks >>>> >>>> On Fri, Mar 27, 2015, 11:43 PM Wisely Chen <wiselyc...@appier.com> >>>> wrote: >>>> >>>>> Hi >>>>> >>>>> In broadcast, spark will collect the whole 3gb object into master node >>>>> and broadcast to each slaves. It is very common situation that the master >>>>> node don't have enough memory . >>>>> >>>>> What is your master node settings? >>>>> >>>>> Wisely Chen >>>>> >>>>> Ankur Srivastava <ankur.srivast...@gmail.com> 於 2015年3月28日 星期六寫道: >>>>> >>>>> I have increased the "spark.storage.memoryFraction" to 0.4 but I >>>>>> still get OOM errors on Spark Executor nodes >>>>>> >>>>>> >>>>>> 15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block >>>>>> broadcast_5_piece10 >>>>>> >>>>>> 15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5 >>>>>> took 2704 ms >>>>>> >>>>>> 15/03/27 23:19:52 INFO MemoryStore: ensureFreeSpace(672530208) called >>>>>> with curMem=2484698683, maxMem=9631778734 >>>>>> >>>>>> 15/03/27 23:19:52 INFO MemoryStore: Block broadcast_5 stored as >>>>>> values in memory (estimated size 641.4 MB, free 6.0 GB) >>>>>> >>>>>> 15/03/27 23:34:02 WARN AkkaUtils: Error sending message in 1 attempts >>>>>> >>>>>> java.util.concurrent.TimeoutException: Futures timed out after [30 >>>>>> seconds] >>>>>> >>>>>> at >>>>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) >>>>>> >>>>>> at >>>>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) >>>>>> >>>>>> at >>>>>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) >>>>>> >>>>>> at >>>>>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) >>>>>> >>>>>> at scala.concurrent.Await$.result(package.scala:107) >>>>>> >>>>>> at >>>>>> org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187) >>>>>> >>>>>> at >>>>>> org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:407) >>>>>> >>>>>> 15/03/27 23:34:02 ERROR Executor: Exception in task 7.0 in stage 2.0 >>>>>> (TID 4007) >>>>>> >>>>>> java.lang.OutOfMemoryError: GC overhead limit exceeded >>>>>> >>>>>> at >>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986) >>>>>> >>>>>> at >>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) >>>>>> >>>>>> at >>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) >>>>>> >>>>>> Thanks >>>>>> >>>>>> Ankur >>>>>> >>>>>> On Fri, Mar 27, 2015 at 2:52 PM, Ankur Srivastava < >>>>>> ankur.srivast...@gmail.com> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I am running a spark cluster on EC2 instances of type: m3.2xlarge. I >>>>>>> have given 26gb of memory with all 8 cores to my executors. I can see >>>>>>> that >>>>>>> in the logs too: >>>>>>> >>>>>>> *15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added: >>>>>>> app-20150327213106-0000/0 on worker-20150327212934-10.x.y.z-40128 >>>>>>> (10.x.y.z:40128) with 8 cores* >>>>>>> >>>>>>> I am not caching any RDD so I have set >>>>>>> "spark.storage.memoryFraction" to 0.2. I can see on SparkUI under >>>>>>> executors >>>>>>> tab Memory used is 0.0/4.5 GB. >>>>>>> >>>>>>> I am now confused with these logs? >>>>>>> >>>>>>> *15/03/27 21:31:08 INFO BlockManagerMasterActor: Registering block >>>>>>> manager 10.77.100.196:58407 <http://10.77.100.196:58407> with 4.5 GB >>>>>>> RAM, >>>>>>> BlockManagerId(4, 10.x.y.z, 58407)* >>>>>>> >>>>>>> I am broadcasting a large object of 3 gb and after that when I am >>>>>>> creating an RDD, I see logs which show this 4.5 GB memory getting full >>>>>>> and >>>>>>> then I get OOM. >>>>>>> >>>>>>> How can I make block manager use more memory? >>>>>>> >>>>>>> Is there any other fine tuning I need to do for broadcasting large >>>>>>> objects? >>>>>>> >>>>>>> And does broadcast variable use cache memory or rest of the heap? >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Ankur >>>>>>> >>>>>> >>>>>> >>> >> >