Re: Understanding Spark Memory distribution

Imran Rashid Mon, 13 Apr 2015 20:12:07 -0700

broadcast variables count towards "spark.storage.memoryFraction", so they
use the same "pool" of memory as cached RDDs.

That being said, I'm really not sure why you are running into problems, it
seems like you have plenty of memory available.  Most likely its got
nothing to do with broadcast variables or caching -- its just whatever
logic you are applying in your transformations that are causing lots of GC
to occur during the computation.  Hard to say without knowing more details.

You could try increasing the timeout for the failed askWithReply by
increasing "spark.akka.lookupTimeout" (defaults to 30), but that would most
likely be treating a symptom, not the root cause.

On Fri, Mar 27, 2015 at 4:52 PM, Ankur Srivastava <
ankur.srivast...@gmail.com> wrote:

> Hi All,
>
> I am running a spark cluster on EC2 instances of type: m3.2xlarge. I have
> given 26gb of memory with all 8 cores to my executors. I can see that in
> the logs too:
>
> *15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added:
> app-20150327213106-0000/0 on worker-20150327212934-10.x.y.z-40128
> (10.x.y.z:40128) with 8 cores*
>
> I am not caching any RDD so I have set "spark.storage.memoryFraction" to
> 0.2. I can see on SparkUI under executors tab Memory used is 0.0/4.5 GB.
>
> I am now confused with these logs?
>
> *15/03/27 21:31:08 INFO BlockManagerMasterActor: Registering block manager
> 10.77.100.196:58407 <http://10.77.100.196:58407> with 4.5 GB RAM,
> BlockManagerId(4, 10.x.y.z, 58407)*
>
> I am broadcasting a large object of 3 gb and after that when I am creating
> an RDD, I see logs which show this 4.5 GB memory getting full and then I
> get OOM.
>
> How can I make block manager use more memory?
>
> Is there any other fine tuning I need to do for broadcasting large objects?
>
> And does broadcast variable use cache memory or rest of the heap?
>
>
> Thanks
>
> Ankur
>

Re: Understanding Spark Memory distribution

Reply via email to