Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21577
  
    I was referring to a race caused by asynchronously killing speculative 
tasks. Granted it's incredibly unlikely to occur in real life:
    
    - in s1a1 1, t1 and t2 are started for the same partition, t1 succeeds, a 
kill is sent for t2
    - s1 finishes, coordinator state is cleared for that stage
    - s2a1 fails, causes s1 to be re-submitted
    - t2 finishes before that kill message arrives, is allowed to commit.
    
    If that can happen it would generate a duplicate map output; but my guess 
(hope?) is that the map output tracker would only keep one of them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to