GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/22923

    [SPARK-25910][CORE] accumulator updates from previous stage attempt should 
not log error

    ## What changes were proposed in this pull request?
    
    For shuffle map stages, we may have multiple attempts, while only the 
latest attempt is active. However, the scheduler still accepts successful tasks 
from previous attempts, to speed up the execution.
    
    Each stage attempt has a `StageInfo` instance, which contains 
`TaskMetrics`. `TaskMetrics` has a bunch of accumulators to track the metrics 
like CPU time, etc. However, a stage only keeps the `StageInfo` of the latest 
attempt, which means the `StageInfo` of previous attempts will be GCed, and 
their accumulators of `TaskMetrics` will be cleaned.
    
    This causes a problem: When the scheduler accepts a successful task from a 
previous attempt, and tries to update accumulators, we may fail to get the 
accumulators from `AccumulatorContext`, as they are already cleaned. And we may 
hit error log like
    ```
    18/10/21 15:30:24 INFO ContextCleaner: Cleaned accumulator 2868 (name: 
internal.metrics.executorDeserializeTime)
    18/10/21 15:30:24 ERROR DAGScheduler: Failed to update accumulators for 
task 7927
    org.apache.spark.SparkException: attempted to access non-existent 
accumulator 2868
    at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1267)
    ...
    ```
    
    This PR proposes a simple fix: When the scheduler receives successful tasks 
from previous attempts, don't update accumulators. Accumulators of previous 
stage attemps are not tracked anymore, so we don't need to update them.
    
    
    ## How was this patch tested?
    
    a new test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark late-task

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22923.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22923
    
----
commit 07f900cf845662186f8d1daea3be9abe2633d5c0
Author: Wenchen Fan <wenchen@...>
Date:   2018-11-01T15:40:14Z

    accumulator updates from previous stage attempt

commit 4d9cbe043604e76b6367e4ecb42d0d36437d1792
Author: Wenchen Fan <wenchen@...>
Date:   2018-11-01T16:04:41Z

    different fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to