Hi Wisely,

I am running on Amazon EC2 instances so I can not doubt the hardware.
Moreover my other pipelines run successfully except for this which involves
Broadcasting large object.

My spark-en.sh setting are:

SPARK_MASTER_IP=<MASTER-IP>

SPARK_LOCAL_IP=<LOCAL-IP>

SPARK_DRIVER_MEMORY=24g

SPARK_WORKER_MEMORY=28g

SPARK_EXECUTOR_MEMORY=26g

SPARK_WORKER_CORES=8

My spark-default.sh settings are:

spark.eventLog.enabled           true

spark.eventLog.dir               /srv/logs/

spark.serializer                 org.apache.spark.serializer.KryoSerializer

spark.kryo.registrator           com.test.utils.KryoSerializationRegistrator

spark.executor.extraJavaOptions  "-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/srv/logs/ -XX:+UseG1GC"

spark.shuffle.consolidateFiles   true

spark.shuffle.manager            sort

spark.shuffle.compress           true

spark.rdd.compress               true
Thanks
Ankur

On Sat, Mar 28, 2015 at 7:57 AM, Wisely Chen <wiselyc...@appier.com> wrote:

> Hi Ankur
>
> If your hardware is ok, looks like it is config problem. Can you show me
> the config of spark-env.sh or JVM config?
>
> Thanks
>
> Wisely Chen
>
> 2015-03-28 15:39 GMT+08:00 Ankur Srivastava <ankur.srivast...@gmail.com>:
>
>> Hi Wisely,
>> I have 26gb for driver and the master is running on m3.2xlarge machines.
>>
>> I see OOM errors on workers and even they are running with 26th of memory.
>>
>> Thanks
>>
>> On Fri, Mar 27, 2015, 11:43 PM Wisely Chen <wiselyc...@appier.com> wrote:
>>
>>> Hi
>>>
>>> In broadcast, spark will collect the whole 3gb object into master node
>>> and broadcast to each slaves. It is very common situation that the master
>>> node don't have enough memory .
>>>
>>> What is your master node settings?
>>>
>>> Wisely Chen
>>>
>>> Ankur Srivastava <ankur.srivast...@gmail.com> 於 2015年3月28日 星期六寫道:
>>>
>>> I have increased the "spark.storage.memoryFraction" to 0.4 but I still
>>>> get OOM errors on Spark Executor nodes
>>>>
>>>>
>>>> 15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block
>>>> broadcast_5_piece10
>>>>
>>>> 15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5
>>>> took 2704 ms
>>>>
>>>> 15/03/27 23:19:52 INFO MemoryStore: ensureFreeSpace(672530208) called
>>>> with curMem=2484698683, maxMem=9631778734
>>>>
>>>> 15/03/27 23:19:52 INFO MemoryStore: Block broadcast_5 stored as values
>>>> in memory (estimated size 641.4 MB, free 6.0 GB)
>>>>
>>>> 15/03/27 23:34:02 WARN AkkaUtils: Error sending message in 1 attempts
>>>>
>>>> java.util.concurrent.TimeoutException: Futures timed out after [30
>>>> seconds]
>>>>
>>>>         at
>>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>>>
>>>>         at
>>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>>>
>>>>         at
>>>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>>>
>>>>         at
>>>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>>>
>>>>         at scala.concurrent.Await$.result(package.scala:107)
>>>>
>>>>         at
>>>> org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187)
>>>>
>>>>         at
>>>> org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:407)
>>>>
>>>> 15/03/27 23:34:02 ERROR Executor: Exception in task 7.0 in stage 2.0
>>>> (TID 4007)
>>>>
>>>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>
>>>>         at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986)
>>>>
>>>>         at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>>>
>>>>         at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>>
>>>> Thanks
>>>>
>>>> Ankur
>>>>
>>>> On Fri, Mar 27, 2015 at 2:52 PM, Ankur Srivastava <
>>>> ankur.srivast...@gmail.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am running a spark cluster on EC2 instances of type: m3.2xlarge. I
>>>>> have given 26gb of memory with all 8 cores to my executors. I can see that
>>>>> in the logs too:
>>>>>
>>>>> *15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added:
>>>>> app-20150327213106-0000/0 on worker-20150327212934-10.x.y.z-40128
>>>>> (10.x.y.z:40128) with 8 cores*
>>>>>
>>>>> I am not caching any RDD so I have set "spark.storage.memoryFraction"
>>>>> to 0.2. I can see on SparkUI under executors tab Memory used is 0.0/4.5 
>>>>> GB.
>>>>>
>>>>> I am now confused with these logs?
>>>>>
>>>>> *15/03/27 21:31:08 INFO BlockManagerMasterActor: Registering block
>>>>> manager 10.77.100.196:58407 <http://10.77.100.196:58407> with 4.5 GB RAM,
>>>>> BlockManagerId(4, 10.x.y.z, 58407)*
>>>>>
>>>>> I am broadcasting a large object of 3 gb and after that when I am
>>>>> creating an RDD, I see logs which show this 4.5 GB memory getting full and
>>>>> then I get OOM.
>>>>>
>>>>> How can I make block manager use more memory?
>>>>>
>>>>> Is there any other fine tuning I need to do for broadcasting large
>>>>> objects?
>>>>>
>>>>> And does broadcast variable use cache memory or rest of the heap?
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Ankur
>>>>>
>>>>
>>>>
>

Reply via email to