[ https://issues.apache.org/jira/browse/SPARK-29257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kent Yao updated SPARK-29257: ----------------------------- Attachment: image-2019-09-26-16-44-48-554.png > All Task attempts scheduled to the same executor inevitably access the same > bad disk > ------------------------------------------------------------------------------------ > > Key: SPARK-29257 > URL: https://issues.apache.org/jira/browse/SPARK-29257 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Affects Versions: 2.3.4, 2.4.4 > Reporter: Kent Yao > Priority: Major > Attachments: image-2019-09-26-16-44-48-554.png > > > We have an HDFS/YARN cluster with about 2k~3k nodes, each node has 8T * 12 > local disks for storage and shuffle. Sometimes, one or more disks get into > bad status during computations. Sometimes it does cause job level failure, > sometimes does. > The following picture shows one failure job caused by 4 task attempts were > all delivered to the same node and failed with almost the same exception for > writing the index temporary file to the same bad disk. > > This is caused by two reasons: > # As we can see in the figure the data and the node have the best data > locality for reading from HDFS. As the default spark.locality.wait(3s) taking > effect, there is a high probability that those attempts will be scheduled to > this node. > # The index file or data file name for a particular shuffle map task is > fixed. It is formed by the shuffle id, the map id and the noop reduce id > which is always 0. The root local dir is picked by the fixed file name's > non-negative hash code % the disk number. Thus, this value is also fixed. > Even when we have 12 disks in total and only one of them is broken, if the > broken one is once picked, all the following attempts of this task will > inevitably pick the broken one. > > > !image-2019-09-26-16-44-48-554.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org