[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

vanzin Wed, 20 Jun 2018 10:19:36 -0700

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21577
  
    I was referring to a race caused by asynchronously killing speculative 
tasks. Granted it's incredibly unlikely to occur in real life:
    
    - in s1a1 1, t1 and t2 are started for the same partition, t1 succeeds, a 
kill is sent for t2
    - s1 finishes, coordinator state is cleared for that stage
    - s2a1 fails, causes s1 to be re-submitted
    - t2 finishes before that kill message arrives, is allowed to commit.
    
    If that can happen it would generate a duplicate map output; but my guess 
(hope?) is that the map output tracker would only keep one of them.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...

Reply via email to