[ 
https://issues.apache.org/jira/browse/SPARK-19934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang updated SPARK-19934:
-----------------------------
    Description: 
{code}
def handleRemovedExecutor(executorId: String): Unit = {
    // We intentionally do not clean up executors that are already blacklisted 
in
    // nodeToBlacklistedExecs, so that if another executor on the same node 
gets blacklisted, we can
    // blacklist the entire node.  We also can't clean up 
executorIdToBlacklistStatus, so we can
    // eventually remove the executor after the timeout.  Despite not clearing 
those structures
    // here, we don't expect they will grow too big since you won't get too 
many executors on one
    // node, and the timeout will clear it up periodically in any case.
    executorIdToFailureList -= executorId
  }
{code}

I think the comments should be:

{code}
// We intentionally do not clean up executors that are already blacklisted in
    // nodeToBlacklistedExecs, so that if 
{spark.blacklist.application.maxFailedExecutorsPerNode} - 1 executor on the 
same node gets blacklisted, we can
    // blacklist the entire node.
{code}

Reference from the design doc 
https://docs.google.com/document/d/1R2CVKctUZG9xwD67jkRdhBR4sCgccPR2dhTYSRXFEmg/edit.
when consider update a node to application-level blacklist,should follow rule:
Nodes are placed into a blacklist for the entire application when the number of 
blacklisted executors goes over 
spark.blacklist.application.maxFailedExecutorsPerNode (default 2)
and the comment just explain as default value

  was:
{code}
def handleRemovedExecutor(executorId: String): Unit = {
    // We intentionally do not clean up executors that are already blacklisted 
in
    // nodeToBlacklistedExecs, so that if another executor on the same node 
gets blacklisted, we can
    // blacklist the entire node.  We also can't clean up 
executorIdToBlacklistStatus, so we can
    // eventually remove the executor after the timeout.  Despite not clearing 
those structures
    // here, we don't expect they will grow too big since you won't get too 
many executors on one
    // node, and the timeout will clear it up periodically in any case.
    executorIdToFailureList -= executorId
  }
{code}

I think the comments should be:

{code}
// We intentionally do not clean up executors that are already blacklisted in
    // nodeToBlacklistedExecs, so that if 
{spark.blacklist.application.maxFailedExecutorsPerNode} - 1 executor on the 
same node gets blacklisted, we can
    // blacklist the entire node.
{code}

Reference from the design doc 
https://docs.google.com/document/d/1R2CVKctUZG9xwD67jkRdhBR4sCgccPR2dhTYSRXFEmg/edit.
when consider update a node to application-level blacklist,should abey follow 
rule:
Nodes are placed into a blacklist for the entire application when the number of 
blacklisted executors goes over 
spark.blacklist.application.maxFailedExecutorsPerNode (default 2)
and the comment just explain as default value


> code comments are not very clearly in BlackListTracker.scala
> ------------------------------------------------------------
>
>                 Key: SPARK-19934
>                 URL: https://issues.apache.org/jira/browse/SPARK-19934
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation
>    Affects Versions: 2.1.0
>            Reporter: zhoukang
>            Priority: Trivial
>
> {code}
> def handleRemovedExecutor(executorId: String): Unit = {
>     // We intentionally do not clean up executors that are already 
> blacklisted in
>     // nodeToBlacklistedExecs, so that if another executor on the same node 
> gets blacklisted, we can
>     // blacklist the entire node.  We also can't clean up 
> executorIdToBlacklistStatus, so we can
>     // eventually remove the executor after the timeout.  Despite not 
> clearing those structures
>     // here, we don't expect they will grow too big since you won't get too 
> many executors on one
>     // node, and the timeout will clear it up periodically in any case.
>     executorIdToFailureList -= executorId
>   }
> {code}
> I think the comments should be:
> {code}
> // We intentionally do not clean up executors that are already blacklisted in
>     // nodeToBlacklistedExecs, so that if 
> {spark.blacklist.application.maxFailedExecutorsPerNode} - 1 executor on the 
> same node gets blacklisted, we can
>     // blacklist the entire node.
> {code}
> Reference from the design doc 
> https://docs.google.com/document/d/1R2CVKctUZG9xwD67jkRdhBR4sCgccPR2dhTYSRXFEmg/edit.
> when consider update a node to application-level blacklist,should follow rule:
> Nodes are placed into a blacklist for the entire application when the number 
> of blacklisted executors goes over 
> spark.blacklist.application.maxFailedExecutorsPerNode (default 2)
> and the comment just explain as default value



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to