[ https://issues.apache.org/jira/browse/SPARK-31373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuchen Feng updated SPARK-31373: -------------------------------- Issue Type: Question (was: Bug) > Cluster tried to fetch blocks from blacklisted node of previous stage > --------------------------------------------------------------------- > > Key: SPARK-31373 > URL: https://issues.apache.org/jira/browse/SPARK-31373 > Project: Spark > Issue Type: Question > Components: Block Manager > Affects Versions: 2.4.2 > Reporter: Yuchen Feng > Priority: Major > > We enabled blacklist on our Spark application but recently we saw some wierd > issue. > Our code is like > rdd.repartitions(...).mapPartitions(...).groupByKey(...).map().collect() > {{}}In mapPartitions stage, some executors has exception "Can't connect to > host xxxxxx: Connection rest by peer" and tasks on them were failed, so all > executors under this node were blacklisted, as well as this node. These > executors did complete some tasks before blacklisted. > Then in next stage (groupByKey(...).map()), application failed with fetch > failure: IndexOutOfBound Exception when some healthy executor want to fetch > block from one of above blacklisted executors. > It happened multiple times. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org