[ https://issues.apache.org/jira/browse/SPARK-27637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-27637: ----------------------------------- Assignee: feiwang > If exception occured while fetching blocks by netty block transfer service, > check whether the relative executor is alive before retry > -------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-27637 > URL: https://issues.apache.org/jira/browse/SPARK-27637 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Affects Versions: 2.3.3, 2.4.3 > Reporter: feiwang > Assignee: feiwang > Priority: Major > Fix For: 3.0.0 > > > There are several kinds of shuffle client, blockTransferService and > externalShuffleClient. > For the externalShuffleClient, there are relative external shuffle service, > which guarantees the shuffle block data and regardless the state of > executors. > For the blockTransferService, it is used to fetch broadcast block, and fetch > the shuffle data when external shuffle service is not enabled. > When fetching data by using blockTransferService, the shuffle client would > connect relative executor's blockManager, so if the relative executor is > dead, it would never fetch successfully. > When spark.shuffle.service.enabled is true and > spark.dynamicAllocation.enabled is true, the executor will be removed while > it has been idle for more than idleTimeout. > If a blockTransferService create connection to relative executor > successfully, but the relative executor is removed when beginning to fetch > broadcast block, it would retry (see RetryingBlockFetcher), which is > Ineffective. > If the spark.shuffle.io.retryWait and spark.shuffle.io.maxRetries is big, > such as 30s and 10 times, it would waste 5 minutes. > So, I think we should judge whether the relative executor is alive before > retry. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org