Kay Ousterhout created SPARK-11178:
--------------------------------------

             Summary: Improve naming around task failures in scheduler code
                 Key: SPARK-11178
                 URL: https://issues.apache.org/jira/browse/SPARK-11178
             Project: Spark
          Issue Type: Improvement
          Components: Scheduler
    Affects Versions: 1.5.1
            Reporter: Kay Ousterhout
            Assignee: Kay Ousterhout
            Priority: Trivial


Commit af3bc59d1f5d9d952c2d7ad1af599c49f1dbdaf0 introduced new functionality so 
that if an executor dies for a reason that's not caused by one of the tasks 
running on the executor (e.g., due to pre-emption), Spark doesn't count the 
failure towards the maximum number of failures for the task.  That commit 
introduced some vague naming that I think we should fix; in particular:
    
(1) The variable "isNormalExit", which was used to refer to cases where the 
executor died for a reason unrelated to the tasks running on the machine.  The 
problem with the existing name is that it's not clear (at least to me!) what it 
means for an exit to be "normal".
    
(2) The variable "shouldEventuallyFailJob" is used to determine whether a 
task's failure should be counted towards the maximum number of failures allowed 
for a task before the associated Stage is aborted. The problem with the 
existing name is that it can be confused with implying that the task's failure 
should immediately cause the stage to fail because it is somehow fatal (this is 
the case for a fetch failure, for example: if a task fails because of a fetch 
failure, there's no point in retrying, and the whole stage should be failed).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to