[ 
https://issues.apache.org/jira/browse/SPARK-19659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895411#comment-15895411
 ] 

jin xing edited comment on SPARK-19659 at 3/4/17 2:11 AM:
----------------------------------------------------------

[~rxin]
Thanks a lot for comment.
Tracking average size and also the outliers is a good idea.
But there can be multiple huge blocks creating too much pressure(e.g. there are 
10% blocks much bigger than they other 90%) and it is a little bit hard to 
decide how many outliers we should track. 
If we track too many outliers, *MapStatus* can cost too much memory.
I think the benefit of tracking the max for each N/2000 consecutive blocks is 
that we can avoid having *MapStatus* cost too much memory(at most around 
2000Bytes after compressing) and we can have all outliers under control. Do you 
think it's worth trying?


was (Author: jinxing6...@126.com):
[~rxin]
Thanks a lot for comment.
Tracking average size and also the outliers is a good idea.
But there can be multiple huge blocks creating too much pressure(e.g. there are 
10% blocks much bigger than they other 90%) and it is a little bit hard to 
decide how many outliers we should track. 
If we track too many outliers, *MapStatus* can cost too much memory.
I think the benefit of tracking the max for each N/2000 consecutive blocks is 
that we can avoid having *MapStatus* cost too much memory(at most around 
2000Bytes).

> Fetch big blocks to disk when shuffle-read
> ------------------------------------------
>
>                 Key: SPARK-19659
>                 URL: https://issues.apache.org/jira/browse/SPARK-19659
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 2.1.0
>            Reporter: jin xing
>         Attachments: SPARK-19659-design-v1.pdf
>
>
> Currently the whole block is fetched into memory(offheap by default) when 
> shuffle-read. A block is defined by (shuffleId, mapId, reduceId). Thus it can 
> be large when skew situations. If OOM happens during shuffle read, job will 
> be killed and users will be notified to "Consider boosting 
> spark.yarn.executor.memoryOverhead". Adjusting parameter and allocating more 
> memory can resolve the OOM. However the approach is not perfectly suitable 
> for production environment, especially for data warehouse.
> Using Spark SQL as data engine in warehouse, users hope to have a unified 
> parameter(e.g. memory) but less resource wasted(resource is allocated but not 
> used),
> It's not always easy to predict skew situations, when happen, it make sense 
> to fetch remote blocks to disk for shuffle-read, rather than
> kill the job because of OOM. This approach is mentioned during the discussion 
> in SPARK-3019, by [~sandyr] and [~mridulm80]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to