[ 
https://issues.apache.org/jira/browse/SPARK-31373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31373.
----------------------------------
    Resolution: Invalid

Let's ask questions into mailing list rather then filing as an issue (see also 
https://spark.apache.org/community.html)

> Cluster tried to fetch blocks from blacklisted node of previous stage
> ---------------------------------------------------------------------
>
>                 Key: SPARK-31373
>                 URL: https://issues.apache.org/jira/browse/SPARK-31373
>             Project: Spark
>          Issue Type: Question
>          Components: Block Manager
>    Affects Versions: 2.4.2
>         Environment: EMR cluster with r5.4xlarge and r5.8xlarge instances
>            Reporter: Yuchen Feng
>            Priority: Major
>
> We enabled blacklist on our Spark application but recently we saw some wierd 
> issue.
> Our code is like
>   {{rdd.repartitions(...).mapPartitions(...).groupByKey(...).map().collect()}}
> In mapPartitions stage, some executors has exception "Can't connect to host 
> xxxxxx: Connection rest by peer" and tasks on them were failed, so all 
> executors under this node were blacklisted, as well as this node. These 
> executors did complete some tasks before blacklisted.
> Then in next stage (groupByKey(...).map()), application failed with block 
> fetch failure: IndexOutOfBound Exception when some healthy executor want to 
> fetch block from one of above blacklisted executors.
> It happened multiple times.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to