Re: Understanding Spark Memory distribution

2015-03-28 Thread Wisely Chen
Hi Ankur

If your hardware is ok, looks like it is config problem. Can you show me
the config of spark-env.sh or JVM config?

Thanks

Wisely Chen

2015-03-28 15:39 GMT+08:00 Ankur Srivastava ankur.srivast...@gmail.com:

 Hi Wisely,
 I have 26gb for driver and the master is running on m3.2xlarge machines.

 I see OOM errors on workers and even they are running with 26th of memory.

 Thanks

 On Fri, Mar 27, 2015, 11:43 PM Wisely Chen wiselyc...@appier.com wrote:

 Hi

 In broadcast, spark will collect the whole 3gb object into master node
 and broadcast to each slaves. It is very common situation that the master
 node don't have enough memory .

 What is your master node settings?

 Wisely Chen

 Ankur Srivastava ankur.srivast...@gmail.com 於 2015年3月28日 星期六寫道:

 I have increased the spark.storage.memoryFraction to 0.4 but I still
 get OOM errors on Spark Executor nodes


 15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block
 broadcast_5_piece10

 15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5
 took 2704 ms

 15/03/27 23:19:52 INFO MemoryStore: ensureFreeSpace(672530208) called
 with curMem=2484698683, maxMem=9631778734

 15/03/27 23:19:52 INFO MemoryStore: Block broadcast_5 stored as values
 in memory (estimated size 641.4 MB, free 6.0 GB)

 15/03/27 23:34:02 WARN AkkaUtils: Error sending message in 1 attempts

 java.util.concurrent.TimeoutException: Futures timed out after [30
 seconds]

 at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

 at
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

 at
 scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)

 at
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

 at scala.concurrent.Await$.result(package.scala:107)

 at
 org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187)

 at
 org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:407)

 15/03/27 23:34:02 ERROR Executor: Exception in task 7.0 in stage 2.0
 (TID 4007)

 java.lang.OutOfMemoryError: GC overhead limit exceeded

 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986)

 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)

 Thanks

 Ankur

 On Fri, Mar 27, 2015 at 2:52 PM, Ankur Srivastava 
 ankur.srivast...@gmail.com wrote:

 Hi All,

 I am running a spark cluster on EC2 instances of type: m3.2xlarge. I
 have given 26gb of memory with all 8 cores to my executors. I can see that
 in the logs too:

 *15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added:
 app-20150327213106-/0 on worker-20150327212934-10.x.y.z-40128
 (10.x.y.z:40128) with 8 cores*

 I am not caching any RDD so I have set spark.storage.memoryFraction
 to 0.2. I can see on SparkUI under executors tab Memory used is 0.0/4.5 GB.

 I am now confused with these logs?

 *15/03/27 21:31:08 INFO BlockManagerMasterActor: Registering block
 manager 10.77.100.196:58407 http://10.77.100.196:58407 with 4.5 GB RAM,
 BlockManagerId(4, 10.x.y.z, 58407)*

 I am broadcasting a large object of 3 gb and after that when I am
 creating an RDD, I see logs which show this 4.5 GB memory getting full and
 then I get OOM.

 How can I make block manager use more memory?

 Is there any other fine tuning I need to do for broadcasting large
 objects?

 And does broadcast variable use cache memory or rest of the heap?


 Thanks

 Ankur





Re: Understanding Spark Memory distribution

2015-03-28 Thread Wisely Chen
Hi

In broadcast, spark will collect the whole 3gb object into master node and
broadcast to each slaves. It is very common situation that the master node
don't have enough memory .

What is your master node settings?

Wisely Chen

Ankur Srivastava ankur.srivast...@gmail.com 於 2015年3月28日 星期六寫道:

 I have increased the spark.storage.memoryFraction to 0.4 but I still
 get OOM errors on Spark Executor nodes


 15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block
 broadcast_5_piece10

 15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5 took
 2704 ms

 15/03/27 23:19:52 INFO MemoryStore: ensureFreeSpace(672530208) called with
 curMem=2484698683, maxMem=9631778734

 15/03/27 23:19:52 INFO MemoryStore: Block broadcast_5 stored as values in
 memory (estimated size 641.4 MB, free 6.0 GB)

 15/03/27 23:34:02 WARN AkkaUtils: Error sending message in 1 attempts

 java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]

 at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)

 at
 scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)

 at
 scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)

 at
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)

 at scala.concurrent.Await$.result(package.scala:107)

 at
 org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187)

 at
 org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:407)

 15/03/27 23:34:02 ERROR Executor: Exception in task 7.0 in stage 2.0 (TID
 4007)

 java.lang.OutOfMemoryError: GC overhead limit exceeded

 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986)

 at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)

 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)

 Thanks

 Ankur

 On Fri, Mar 27, 2015 at 2:52 PM, Ankur Srivastava 
 ankur.srivast...@gmail.com
 javascript:_e(%7B%7D,'cvml','ankur.srivast...@gmail.com'); wrote:

 Hi All,

 I am running a spark cluster on EC2 instances of type: m3.2xlarge. I have
 given 26gb of memory with all 8 cores to my executors. I can see that in
 the logs too:

 *15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added:
 app-20150327213106-/0 on worker-20150327212934-10.x.y.z-40128
 (10.x.y.z:40128) with 8 cores*

 I am not caching any RDD so I have set spark.storage.memoryFraction to
 0.2. I can see on SparkUI under executors tab Memory used is 0.0/4.5 GB.

 I am now confused with these logs?

 *15/03/27 21:31:08 INFO BlockManagerMasterActor: Registering block
 manager 10.77.100.196:58407 http://10.77.100.196:58407 with 4.5 GB RAM,
 BlockManagerId(4, 10.x.y.z, 58407)*

 I am broadcasting a large object of 3 gb and after that when I am
 creating an RDD, I see logs which show this 4.5 GB memory getting full and
 then I get OOM.

 How can I make block manager use more memory?

 Is there any other fine tuning I need to do for broadcasting large
 objects?

 And does broadcast variable use cache memory or rest of the heap?


 Thanks

 Ankur