Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4066#issuecomment-70729163
  
    Hmm, let me think about this a bit...
    
    I don't think that we have any end-to-end tests with task speculation right 
now, since Spark won't schedule speculated tasks on executors that are running 
on the same host.  However, I don't think that end-to-end tests should be the 
only/best way to test this, since there are a few special cases that require us 
to have finer-grained control over the event interleaving.
    
    Here are some of the scenarios that I'd like to test:
    
    - Two copies of a task are running and both try to commit at (essentially) 
the same time.  In other words, `needsTaskCommit` should return true for both 
tasks, but only one should be allowed to commit.
    - The task that is authorized to commit crashes before committing.  In this 
case, a new copy of the task should be scheduled and that task should be able 
to successfully commit its output (i.e. the "lock" held by the winning task 
should be released if that task dies without completing the commit process).
    
    Since this involves an interplay between DAGScheduler logic, 
OutputCommitCoordinator, and actual tasks, we'll might have to use a mock 
TaskSchedulerBackend and mock task to test this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to