I like your solution, much better. This might help me to solve another
problem I'm experiencing with JoinManager and the ZRegisterStorm qa test
at the moment.
Bit short on time lately, I've been helping my Son rebuild the
independent suspension on his 1977 Jag, got it back on the road last
night, sweet ride, smooth & quiet.
Cheers,
Peter.
Patricia Shanahan wrote:
On 7/18/2010 12:13 PM, Peter Firmstone wrote:
Patricia Shanahan wrote:
I have questions about the TaskManager remove(Task) method. Trying to
end work by any means other than run to completion is always a bit
tricky.
In the event of a remove call for an active Task, TaskManager issues
an interrupt unless the call is in the task's thread. It also removes
the Task from its bookkeeping.
In the current implementation, removal from the ArrayList immediately
stops the removed task from blocking later tasks that need to runAfter
it. That can let a dependent task start running before the removed
task notices the interrupt. That could be bad if, for example, they
share data that is being protected through the runAfter system.
Perhaps a hashmap might be used to store tasks that are pending removal,
so they can be left on the queue until the removed task has been
interrupted?
Actually, there is no need for an additional data structure. Document
a requirement that a Task run method should complete ASAP after an
interrupt. During a remove call, if the task is running just send it
an interrupt, but leave it in the same data structure(s) as any
running task. If and when its run method completes, clean it up from
its thread's run method in the normal way.
There are a couple of arguments in favor if this approach:
1. If it goes wrong, the nature of the problem will be obvious in a
debugger. I can even help out debug by associating a flag with the
Task indicating it has been interrupted.
Contrast this with the other failure case. If I remove a task before
its run method completes, a later task that should have run after it
may run in parallel, resulting in a corrupted data structure. In the
worst case, neither notices the problem, but ten minutes later some
other piece of code that works with the same data structure fails
because of the corruption. Such things can be hard to debug.
2. It will be more similar to current behavior, making it less likely
to cause problems in existing code. As described in my e-mail
"TaskManager single thread bottleneck", many of the interesting cases
may be running, much of the time, with one worker thread per
TaskManager. In that case, no other task can run until the removed
task's run method completes, because the removed task holds the only
thread.
If I were to fix the single thread bottleneck but leave the remove
behavior unchanged, some uses of TaskManager might see a task start
running before the removed task has completed for the first time.
The other issue is that a task that ignores the interrupt can keep
possession of its thread, indefinitely, despite a remove call. Since
we only create a bounded number of threads, that could effectively
kill the TaskManager. To what extent are TaskManager instances
shared by tasks whose run methods don't have the same trust
characteristics?
Currently TaskManager is only used internally, so the trust should be
the same between tasks, although I like the idea of a singleton
TaskManager and did mention it earlier, that doesn't suit the current
implementation. We could put an AccessController permission check in
TaskManager before a task is placed into the queue?
Or make it the responsibility of the code that creates and manages a
TaskManager instance to ensure that the mix of tasks is appropriate,
and any needed AccessControl checks have been done?
Patricia