Kai-Hsun Chen created SPARK-39956: ------------------------------------- Summary: Determine task failures based on ExecutorExitCode Key: SPARK-39956 URL: https://issues.apache.org/jira/browse/SPARK-39956 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Kai-Hsun Chen
There are a lot of possible reasons to cause an executor exit. However, the driver will assume every executor exit is caused by task failure. The assumption is wrong. For example, when DiskBlockManager fails to create a directory, it will close executor’s JVM with the exit code {{{}DISK_STORE_FAILED_TO_CREATE_DIR{}}}. Obviously, when the driver received the exit code {{{}DISK_STORE_FAILED_TO_CREATE_DIR{}}}, the executor exit is highly possible caused by hardware failure rather than task failure. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org