[jira] [Created] (SPARK-26907) Does ShuffledRDD Replication Work With External Shuffle Service

Han Altae-Tran (JIRA) Sun, 17 Feb 2019 00:10:08 -0800

Han Altae-Tran created SPARK-26907:
--------------------------------------

             Summary: Does ShuffledRDD Replication Work With External Shuffle 
Service
                 Key: SPARK-26907
                 URL: https://issues.apache.org/jira/browse/SPARK-26907
             Project: Spark
          Issue Type: Question
          Components: Block Manager, YARN
    Affects Versions: 2.3.2
            Reporter: Han Altae-Tran



I am interested in working with high replication environments for extreme fault 
tolerance (e.g. 10x replication), but have noticed that when using groupBy or 
groupWith followed by persist (with 10x replication), even if one node fails, 
the entire stage can fail with FetchFailedException.

 

Is this because the External Shuffle Service writes and services intermediate 
shuffle data only to/from the local disk attached to the executor that 
generated it, causing spark to ignore possible replicated shuffle data (from 
the persist) that may be serviced elsewhere? If so, is there any way to 
increase the replication factor of the External Shuffle Service to make it 
fault tolerant?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26907) Does ShuffledRDD Replication Work With External Shuffle Service

Reply via email to