Note that starting with Spark 1.6, memory can be dynamically allocated by
the Spark execution engine based on workload heuristics.

You can still set a low watermark for the spark.storage.memoryFraction (RDD
cache), but the rest can be dynamic.

Here's some relevant slides from a recent presentation I did in Toronto:
http://www.slideshare.net/cfregly/toronto-spark-meetup-dec-14-2015 (slide
63+)

This adaptiveness will be a continual theme with Spark moving forward.

For example, Spark 1.6 includes adaptive query execution, as well,
including an adaptive, hybrid join that can broadcast just the hot, popular
keys using BroadcastHashJoin, but use the regular ShuffleHashJoin for
low-medium popular keys - within the same Job - and changing from stage to
stage.


On Wed, Dec 23, 2015 at 2:12 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> If you are creating a huge map on the driver, then spark.driver.memory
> should be set to a higher value to hold your map. Since you are going to
> broadcast this map, your spark executors must have enough memory to hold
> this map as well which can be set using the spark.executor.memory, and
> spark.storage.memoryFraction configurations.
>
> Thanks
> Best Regards
>
> On Mon, Dec 21, 2015 at 5:50 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
>
>> I have a large Map that is assembled in the driver and broadcast to each
>> node.
>>
>> My question is how best to allocate memory for this.  The Driver has to
>> have enough memory for the Maps, but only one copy is serialized to each
>> node. What type of memory should I size to match the Maps? Is the broadcast
>> Map taking a little from each executor, all from every executor, or is
>> there something other than driver and executor memory I can size?
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


-- 

*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com

Reply via email to