On 7/18/2010 12:13 PM, Peter Firmstone wrote:
Patricia Shanahan wrote:
I have questions about the TaskManager remove(Task) method. Trying to
end work by any means other than run to completion is always a bit
tricky.

In the event of a remove call for an active Task, TaskManager issues
an interrupt unless the call is in the task's thread. It also removes
the Task from its bookkeeping.

In the current implementation, removal from the ArrayList immediately
stops the removed task from blocking later tasks that need to runAfter
it. That can let a dependent task start running before the removed
task notices the interrupt. That could be bad if, for example, they
share data that is being protected through the runAfter system.

Perhaps a hashmap might be used to store tasks that are pending removal,
so they can be left on the queue until the removed task has been
interrupted?

Actually, there is no need for an additional data structure. Document a requirement that a Task run method should complete ASAP after an interrupt. During a remove call, if the task is running just send it an interrupt, but leave it in the same data structure(s) as any running task. If and when its run method completes, clean it up from its thread's run method in the normal way.

There are a couple of arguments in favor if this approach:

1. If it goes wrong, the nature of the problem will be obvious in a debugger. I can even help out debug by associating a flag with the Task indicating it has been interrupted.

Contrast this with the other failure case. If I remove a task before its run method completes, a later task that should have run after it may run in parallel, resulting in a corrupted data structure. In the worst case, neither notices the problem, but ten minutes later some other piece of code that works with the same data structure fails because of the corruption. Such things can be hard to debug.

2. It will be more similar to current behavior, making it less likely to cause problems in existing code. As described in my e-mail "TaskManager single thread bottleneck", many of the interesting cases may be running, much of the time, with one worker thread per TaskManager. In that case, no other task can run until the removed task's run method completes, because the removed task holds the only thread.

If I were to fix the single thread bottleneck but leave the remove behavior unchanged, some uses of TaskManager might see a task start running before the removed task has completed for the first time.

The other issue is that a task that ignores the interrupt can keep possession 
of its thread, indefinitely, despite a remove call. Since we only create a 
bounded number of threads, that could effectively kill the TaskManager. To what 
extent are TaskManager instances shared by tasks whose run methods don't have 
the same trust characteristics?

Currently TaskManager is only used internally, so the trust should be the same 
between tasks, although I like the idea of a singleton TaskManager and did 
mention it earlier, that doesn't suit the current implementation.  We could put 
an AccessController permission check in TaskManager before a task is placed 
into the queue?

Or make it the responsibility of the code that creates and manages a TaskManager instance to ensure that the mix of tasks is appropriate, and any needed AccessControl checks have been done?

Patricia


Reply via email to