Kai-Hsun Chen created SPARK-39956:
-------------------------------------

             Summary: Determine task failures based on ExecutorExitCode
                 Key: SPARK-39956
                 URL: https://issues.apache.org/jira/browse/SPARK-39956
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.4.0
            Reporter: Kai-Hsun Chen


There are a lot of possible reasons to cause an executor exit. However, the 
driver will assume every executor exit is caused by task failure. The 
assumption is wrong. For example, when DiskBlockManager fails to create a 
directory, it will close executor’s JVM with the exit code 
{{{}DISK_STORE_FAILED_TO_CREATE_DIR{}}}. Obviously, when the driver received 
the exit code {{{}DISK_STORE_FAILED_TO_CREATE_DIR{}}}, the executor exit is 
highly possible caused by hardware failure rather than task failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to