[ https://issues.apache.org/jira/browse/SPARK-19659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895411#comment-15895411 ]
jin xing edited comment on SPARK-19659 at 3/4/17 2:11 AM: ---------------------------------------------------------- [~rxin] Thanks a lot for comment. Tracking average size and also the outliers is a good idea. But there can be multiple huge blocks creating too much pressure(e.g. there are 10% blocks much bigger than they other 90%) and it is a little bit hard to decide how many outliers we should track. If we track too many outliers, *MapStatus* can cost too much memory. I think the benefit of tracking the max for each N/2000 consecutive blocks is that we can avoid having *MapStatus* cost too much memory(at most around 2000Bytes after compressing) and we can have all outliers under control. Do you think it's worth trying? was (Author: jinxing6...@126.com): [~rxin] Thanks a lot for comment. Tracking average size and also the outliers is a good idea. But there can be multiple huge blocks creating too much pressure(e.g. there are 10% blocks much bigger than they other 90%) and it is a little bit hard to decide how many outliers we should track. If we track too many outliers, *MapStatus* can cost too much memory. I think the benefit of tracking the max for each N/2000 consecutive blocks is that we can avoid having *MapStatus* cost too much memory(at most around 2000Bytes). > Fetch big blocks to disk when shuffle-read > ------------------------------------------ > > Key: SPARK-19659 > URL: https://issues.apache.org/jira/browse/SPARK-19659 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Affects Versions: 2.1.0 > Reporter: jin xing > Attachments: SPARK-19659-design-v1.pdf > > > Currently the whole block is fetched into memory(offheap by default) when > shuffle-read. A block is defined by (shuffleId, mapId, reduceId). Thus it can > be large when skew situations. If OOM happens during shuffle read, job will > be killed and users will be notified to "Consider boosting > spark.yarn.executor.memoryOverhead". Adjusting parameter and allocating more > memory can resolve the OOM. However the approach is not perfectly suitable > for production environment, especially for data warehouse. > Using Spark SQL as data engine in warehouse, users hope to have a unified > parameter(e.g. memory) but less resource wasted(resource is allocated but not > used), > It's not always easy to predict skew situations, when happen, it make sense > to fetch remote blocks to disk for shuffle-read, rather than > kill the job because of OOM. This approach is mentioned during the discussion > in SPARK-3019, by [~sandyr] and [~mridulm80] -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org