[jira] [Commented] (SPARK-19698) Race condition in stale attempt task completion vs current attempt task completion when task is doing persistent state changes

Jisoo Kim (JIRA) Thu, 23 Feb 2017 17:55:01 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-19698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881771#comment-15881771
 ]


Jisoo Kim commented on SPARK-19698:
-----------------------------------

Ah, I see what you mean. I don't use Spark's speculation feature, so I wasn't 
aware that the running tasks won't be killed when their speculative copies get 
restarted. What is the reason behind not killing the stale tasks that were 
overridden? Is that for performance? 

I found that TaskSetManager will kill all the other attempts for the specific 
task when one of the attempts succeeds: 
https://github.com/apache/spark/blob/d9043092caf71d5fa6be18ae8c51a0158bc2218e/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L709

However, the above scenario still concerns me in case the task has some other 
long-running computation after modifying external state. In that case, Attempt 
1 can be launched after Attempt 0 finishes modifying external state (but is 
still doing some computation) and gets partway through its own modification. I 
think in this case if Attempt 1 gets killed or all other partitions are 
"finished" before Attempt 1 finishes, the same problem can happen. 

I wonder if this approach is a viable solution:
- Have additional information (task attemptNumber from task info) when adding 
the task index to speculableTasks 
(https://github.com/apache/spark/blob/d9043092caf71d5fa6be18ae8c51a0158bc2218e/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L937)
- Have TaskSetManager to notify the driver only when the completed task is not 
inside speculableTasks 
(https://github.com/apache/spark/blob/d9043092caf71d5fa6be18ae8c51a0158bc2218e/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L706)



> Race condition in stale attempt task completion vs current attempt task 
> completion when task is doing persistent state changes
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-19698
>                 URL: https://issues.apache.org/jira/browse/SPARK-19698
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos, Spark Core
>    Affects Versions: 2.0.0
>            Reporter: Charles Allen
>
> We have encountered a strange scenario in our production environment. Below 
> is the best guess we have right now as to what's going on.
> Potentially, the final stage of a job has a failure in one of the tasks (such 
> as OOME on the executor) which can cause tasks for that stage to be 
> relaunched in a second attempt.
> https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1155
> keeps track of which tasks have been completed, but does NOT keep track of 
> which attempt those tasks were completed in. As such, we have encountered a 
> scenario where a particular task gets executed twice in different stage 
> attempts, and the DAGScheduler does not consider if the second attempt is 
> still running. This means if the first task attempt succeeded, the second 
> attempt can be cancelled part-way through its run cycle if all other tasks 
> (including the prior failed) are completed successfully.
> What this means is that if a task is manipulating some state somewhere (for 
> example: a upload-to-temporary-file-location, then delete-then-move on an 
> underlying s3n storage implementation) the driver can improperly shutdown the 
> running (2nd attempt) task between state manipulations, leaving the 
> persistent state in a bad state since the 2nd attempt never got to complete 
> its manipulations, and was terminated prematurely at some arbitrary point in 
> its state change logic (ex: finished the delete but not the move).
> This is using the mesos coarse grained executor. It is unclear if this 
> behavior is limited to the mesos coarse grained executor or not.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19698) Race condition in stale attempt task completion vs current attempt task completion when task is doing persistent state changes

Reply via email to