Re: Understanding Spark Memory distribution

Ankur Srivastava Mon, 30 Mar 2015 14:49:57 -0700

Hi Wisely,

I am running spark 1.2.1 and I have checked the process heap and it is
running with all the heap that I am assigning and as I mentioned earlier I
get OOM on workers not the driver or master.


Thanks
Ankur

On Mon, Mar 30, 2015 at 9:24 AM, giive chen <thegi...@gmail.com> wrote:

> Hi Ankur
>
> If you using standalone mode, your config is wrong. You should use "export
> SPARK_DAEMON_MEMORY=xxx "  in config/spark-env.sh. At least it works on my
> spark 1.3.0 standalone mode machine.
>
> BTW, The SPARK_DRIVER_MEMORY is used in Yarn mode and looks like the
> standalone mode don't use this config.
>
> To debug this, please type "ps auxw | grep
> org.apache.spark.deploy.master.[M]aster"  in master machine.
> You can see the Xmx and Xms option.
>
> Wisely Chen
>
>
>
>
>
>
> On Mon, Mar 30, 2015 at 3:55 AM, Ankur Srivastava <
> ankur.srivast...@gmail.com> wrote:
>
>> Hi Wisely,
>>
>> I am running on Amazon EC2 instances so I can not doubt the hardware.
>> Moreover my other pipelines run successfully except for this which involves
>> Broadcasting large object.
>>
>> My spark-en.sh setting are:
>>
>> SPARK_MASTER_IP=<MASTER-IP>
>>
>> SPARK_LOCAL_IP=<LOCAL-IP>
>>
>> SPARK_DRIVER_MEMORY=24g
>>
>> SPARK_WORKER_MEMORY=28g
>>
>> SPARK_EXECUTOR_MEMORY=26g
>>
>> SPARK_WORKER_CORES=8
>>
>> My spark-default.sh settings are:
>>
>> spark.eventLog.enabled           true
>>
>> spark.eventLog.dir               /srv/logs/
>>
>> spark.serializer
>> org.apache.spark.serializer.KryoSerializer
>>
>> spark.kryo.registrator
>> com.test.utils.KryoSerializationRegistrator
>>
>> spark.executor.extraJavaOptions  "-verbose:gc -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError
>> -XX:HeapDumpPath=/srv/logs/ -XX:+UseG1GC"
>>
>> spark.shuffle.consolidateFiles   true
>>
>> spark.shuffle.manager            sort
>>
>> spark.shuffle.compress           true
>>
>> spark.rdd.compress               true
>> Thanks
>> Ankur
>>
>> On Sat, Mar 28, 2015 at 7:57 AM, Wisely Chen <wiselyc...@appier.com>
>> wrote:
>>
>>> Hi Ankur
>>>
>>> If your hardware is ok, looks like it is config problem. Can you show me
>>> the config of spark-env.sh or JVM config?
>>>
>>> Thanks
>>>
>>> Wisely Chen
>>>
>>> 2015-03-28 15:39 GMT+08:00 Ankur Srivastava <ankur.srivast...@gmail.com>
>>> :
>>>
>>>> Hi Wisely,
>>>> I have 26gb for driver and the master is running on m3.2xlarge machines.
>>>>
>>>> I see OOM errors on workers and even they are running with 26th of
>>>> memory.
>>>>
>>>> Thanks
>>>>
>>>> On Fri, Mar 27, 2015, 11:43 PM Wisely Chen <wiselyc...@appier.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> In broadcast, spark will collect the whole 3gb object into master node
>>>>> and broadcast to each slaves. It is very common situation that the master
>>>>> node don't have enough memory .
>>>>>
>>>>> What is your master node settings?
>>>>>
>>>>> Wisely Chen
>>>>>
>>>>> Ankur Srivastava <ankur.srivast...@gmail.com> 於 2015年3月28日 星期六寫道：
>>>>>
>>>>> I have increased the "spark.storage.memoryFraction" to 0.4 but I
>>>>>> still get OOM errors on Spark Executor nodes
>>>>>>
>>>>>>
>>>>>> 15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block
>>>>>> broadcast_5_piece10
>>>>>>
>>>>>> 15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5
>>>>>> took 2704 ms
>>>>>>
>>>>>> 15/03/27 23:19:52 INFO MemoryStore: ensureFreeSpace(672530208) called
>>>>>> with curMem=2484698683, maxMem=9631778734
>>>>>>
>>>>>> 15/03/27 23:19:52 INFO MemoryStore: Block broadcast_5 stored as
>>>>>> values in memory (estimated size 641.4 MB, free 6.0 GB)
>>>>>>
>>>>>> 15/03/27 23:34:02 WARN AkkaUtils: Error sending message in 1 attempts
>>>>>>
>>>>>> java.util.concurrent.TimeoutException: Futures timed out after [30
>>>>>> seconds]
>>>>>>
>>>>>>         at
>>>>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>>>>>
>>>>>>         at
>>>>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>>>>>
>>>>>>         at
>>>>>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>>>>>
>>>>>>         at
>>>>>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>>>>>
>>>>>>         at scala.concurrent.Await$.result(package.scala:107)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187)
>>>>>>
>>>>>>         at
>>>>>> org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:407)
>>>>>>
>>>>>> 15/03/27 23:34:02 ERROR Executor: Exception in task 7.0 in stage 2.0
>>>>>> (TID 4007)
>>>>>>
>>>>>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>>>
>>>>>>         at
>>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986)
>>>>>>
>>>>>>         at
>>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>>>>>
>>>>>>         at
>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Ankur
>>>>>>
>>>>>> On Fri, Mar 27, 2015 at 2:52 PM, Ankur Srivastava <
>>>>>> ankur.srivast...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I am running a spark cluster on EC2 instances of type: m3.2xlarge. I
>>>>>>> have given 26gb of memory with all 8 cores to my executors. I can see 
>>>>>>> that
>>>>>>> in the logs too:
>>>>>>>
>>>>>>> *15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added:
>>>>>>> app-20150327213106-0000/0 on worker-20150327212934-10.x.y.z-40128
>>>>>>> (10.x.y.z:40128) with 8 cores*
>>>>>>>
>>>>>>> I am not caching any RDD so I have set
>>>>>>> "spark.storage.memoryFraction" to 0.2. I can see on SparkUI under 
>>>>>>> executors
>>>>>>> tab Memory used is 0.0/4.5 GB.
>>>>>>>
>>>>>>> I am now confused with these logs?
>>>>>>>
>>>>>>> *15/03/27 21:31:08 INFO BlockManagerMasterActor: Registering block
>>>>>>> manager 10.77.100.196:58407 <http://10.77.100.196:58407> with 4.5 GB 
>>>>>>> RAM,
>>>>>>> BlockManagerId(4, 10.x.y.z, 58407)*
>>>>>>>
>>>>>>> I am broadcasting a large object of 3 gb and after that when I am
>>>>>>> creating an RDD, I see logs which show this 4.5 GB memory getting full 
>>>>>>> and
>>>>>>> then I get OOM.
>>>>>>>
>>>>>>> How can I make block manager use more memory?
>>>>>>>
>>>>>>> Is there any other fine tuning I need to do for broadcasting large
>>>>>>> objects?
>>>>>>>
>>>>>>> And does broadcast variable use cache memory or rest of the heap?
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Ankur
>>>>>>>
>>>>>>
>>>>>>
>>>
>>
>

Re: Understanding Spark Memory distribution

Reply via email to