Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/21577
> * fixed the issue Mridul brought up, but I think the race that Tom
describes still exists. I'm just not sure it would cause problems, since as far
as I can tell it can only happen in a map stage, not a result stage.
@vanzin which race were you referring to here, I think tracking the stage
across attempts fixes both the ones I mentioned in reference to scenario 2 for
Mridul
> There is another case though here where T1_1.1 could have just asked to
be committed, but not yet committed, then if it gets delayed committing, the
new stage attempt starts and T1_1.2 asks if it could commit and is granted, so
then both try to commit at the same time causing corruption.
Fixed because T1_1.2 won't be allowed to commit because we track first
state attempt as committing.
> The caveat there though would be if since T1_1.1 was committed, the
second stage attempt could finish and call commitJob while T1_1.2 is committing
since spark thinks it doesn't need to wait for T1_1.2. Anyway this seems very
unlikely but we should protect against it.
T1_1.2 shouldn't ever be allowed to commit since we track across the
attempts so it wouldn't ever commit after the stage itself has completed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]