[ 
https://issues.apache.org/jira/browse/SPARK-19659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15971025#comment-15971025
 ] 

jin xing commented on SPARK-19659:
----------------------------------

[~cloud_fan]
I refined the the pr. In current change, I'd propose:
1. Track average size and also the outliers(which are larger than 2*avgSize) in 
MapStatus;
2. Request memory from *MemoryManager* before fetch blocks and release the 
memory to MemoryManager when *ManagedBuffer* is released.
3. Fetch remote blocks to disk when failing acquiring memory from 
MemoryManager, otherwise fetch to memory.

I adjust the default value of "spark.memory.offHeap.size" to be 384m(the same 
with spark.yarn.executor.memoryOverhead), which will be used to initialize the 
off heap memory pool. What's more, I think *spark.memory.offHeap.enabled* 
(which is documented in *configuration.md* as "If true, Spark will attempt to 
use off-heap memory for certain operations. ") might be confusing. Because no 
matter it is true or false, remote blocks will be shuffle read to off heap by 
default.

It would great if you could take a rough look at the pr and help comment. Thus 
I can know if I'm on the right direction :)

> Fetch big blocks to disk when shuffle-read
> ------------------------------------------
>
>                 Key: SPARK-19659
>                 URL: https://issues.apache.org/jira/browse/SPARK-19659
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 2.1.0
>            Reporter: jin xing
>         Attachments: SPARK-19659-design-v1.pdf, SPARK-19659-design-v2.pdf
>
>
> Currently the whole block is fetched into memory(offheap by default) when 
> shuffle-read. A block is defined by (shuffleId, mapId, reduceId). Thus it can 
> be large when skew situations. If OOM happens during shuffle read, job will 
> be killed and users will be notified to "Consider boosting 
> spark.yarn.executor.memoryOverhead". Adjusting parameter and allocating more 
> memory can resolve the OOM. However the approach is not perfectly suitable 
> for production environment, especially for data warehouse.
> Using Spark SQL as data engine in warehouse, users hope to have a unified 
> parameter(e.g. memory) but less resource wasted(resource is allocated but not 
> used),
> It's not always easy to predict skew situations, when happen, it make sense 
> to fetch remote blocks to disk for shuffle-read, rather than
> kill the job because of OOM. This approach is mentioned during the discussion 
> in SPARK-3019, by [~sandyr] and [~mridulm80]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to