[ 
https://issues.apache.org/jira/browse/SPARK-19659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jin xing updated SPARK-19659:
-----------------------------
    Description: 
Currently the whole block is fetched into memory(offheap by default) when 
shuffle-read. A block is defined by (shuffleId, mapId, reduceId). Thus it can 
be large when skew situations. If OOM happens during shuffle read, job will be 
killed and users will be notified to "Consider boosting 
spark.yarn.executor.memoryOverhead". Adjusting parameter and allocating more 
memory can resolve the OOM. However the approach is not perfectly suitable for 
production environment, especially for data warehouse.

Using Spark SQL as data engine in warehouse, users hope to have a unified 
parameter(e.g. memory) but less resource wasted(resource is allocated but not 
used),

It's not always easy to predict skew situations, when happen, it make sense to 
fetch remote blocks to disk for shuffle-read, rather than
kill the job because of OOM. This approach is mentioned during the discussion 
in SPARK-3019, by [~sandyr] and [~mridulm80]

  was:
Currently the whole block is fetched into memory(offheap by default) when 
shuffle-read. A block is defined by (shuffleId, mapId, reduceId). Thus it can 
be large when skew situations. If OOM happens during shuffle read, job will be 
killed and users will be notified to "Consider boosting 
spark.yarn.executor.memoryOverhead". Adjusting parameter and allocating more 
memory can resolve the OOM. However the approach is not perfectly suitable for 
production environment, especially for data warehouse.

Using Spark SQL as data engine in warehouse, users hope to have a unified 
parameter(e.g. memory) but less resource wasted(resource is allocated but not 
used),

It's not always easy to predict skew situations, when happen, it make sense to 
fetch remote blocks to disk for shuffle-read, rather than
kill the job because of OOM. This approach is mentioned during the discussing 
in SPARK-3019, by [~sandyr] and [~mridulm80]


> Fetch big blocks to disk when shuffle-read
> ------------------------------------------
>
>                 Key: SPARK-19659
>                 URL: https://issues.apache.org/jira/browse/SPARK-19659
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 2.1.0
>            Reporter: jin xing
>
> Currently the whole block is fetched into memory(offheap by default) when 
> shuffle-read. A block is defined by (shuffleId, mapId, reduceId). Thus it can 
> be large when skew situations. If OOM happens during shuffle read, job will 
> be killed and users will be notified to "Consider boosting 
> spark.yarn.executor.memoryOverhead". Adjusting parameter and allocating more 
> memory can resolve the OOM. However the approach is not perfectly suitable 
> for production environment, especially for data warehouse.
> Using Spark SQL as data engine in warehouse, users hope to have a unified 
> parameter(e.g. memory) but less resource wasted(resource is allocated but not 
> used),
> It's not always easy to predict skew situations, when happen, it make sense 
> to fetch remote blocks to disk for shuffle-read, rather than
> kill the job because of OOM. This approach is mentioned during the discussion 
> in SPARK-3019, by [~sandyr] and [~mridulm80]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to