Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/21577
I was referring to a race caused by asynchronously killing speculative
tasks. Granted it's incredibly unlikely to occur in real life:
- in s1a1 1, t1 and t2 are started for the same partition, t1 succeeds, a
kill is sent for t2
- s1 finishes, coordinator state is cleared for that stage
- s2a1 fails, causes s1 to be re-submitted
- t2 finishes before that kill message arrives, is allowed to commit.
If that can happen it would generate a duplicate map output; but my guess
(hope?) is that the map output tracker would only keep one of them.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]