Gregg Wonderly wrote:
Patricia Shanahan wrote:
On 7/22/2010 12:10 PM, Gregg Wonderly wrote:
I'm thinking more and more that this runAfter business should really be
a O(1) thing. If runAfter() is implemented to return true, can not that
decision be made at the time of enqueue? It just looks like so much of
this is spread around to so few actual users in the current API. As I
look over the usages, there is some fuzzy logic involved in some cases
where the sequence number is used for ordering. ...
I think some of the sequence number stuff may be OK, but if it is OK
it is probably unnecessarily inefficient.
One of the key things about sequence numbers, is that it lets both the
sender and receiver of such marked data understand the significance of
what they have. TCP uses sequence numbers on the IP datagrams to
maintain ordering. It, of course has retry to get all the pieces that
might be missing.
To clarify a bit, I'm only talking about sequence number use in
TaskManager.Task.runAfter. Sequence numbers in general have many
practical uses, but perhaps the most important is efficient
implementation of a protocol with stronger order and/or reliability
guarantees than the underlying channel. TCP is an excellent example.
However, in those uses the recipient is aware of the sequence numbering,
and has rules for dealing with missing sequence numbers. Again, the TCP
retries you mention are a good example of missing sequence number handling.
In the case of task manager, it helps asynchronous activity happening on
separate threads, understand how to manage data races such that outcomes
can be predictable no matter how scheduling occurs.
Here is where I disagree. TaskManager itself is not sequence-number
aware. Suppose T1 and T2 are two tasks with sequence numbers 1 and 2
respectively. T1 and T2 are of the same type, and their runAfter method
would consider them a match in everything it tests except for sequence
number.
T2 will return true for a runAfter test on a list containing only T1. T1
will return false for a runAfter test on a list containing only T2,
because of the sequence number test. Both return false on an empty list.
If they are always added to a TaskManager in the order T1 then T2, the
sequence number check is unnecessary. If a task appears in the runAfter
check collection and conflicts with the current task, it has a lower
sequence number than the current task.
The interesting case is T2 added before T1. Suppose the TaskManager is
empty and has a couple of waiting threads at the time T2 is added. T2
sees an empty runAfter check list, so it is immediately runnable and the
thread is notified. The TaskManager thread may have committed to running
T2 before T1 is added.
Now T1 does its runAfter check on a list containing only T2, and returns
false, so it is also immediately runnable, causing a notify for the
second waiting thread. After all this, T1 and T2 are running simultaneously.
Does it make any sense to delay starting T2 until T1 has finished if
they are added in the order T1 then T2, but run them in parallel if they
are added in the order T2 then T1? Either we can safely run them in
parallel, and neither should runAfter the other, or they cannot safely
run them in parallel, and each should wait for the other if it appears
in the runAfter check list. The sequence number test asymmetry makes no
sense to me.
I am particularly concerned about RetryTask subclasses, because even if
T1 and T2 are initially added in order, T1 may come back around on a
retry after T2 has started, and will not wait for T2 to finish. That is
why I'm so interested in RetryTask retry test coverage.
There may be something else in the relevant RetryTask cases preventing
adding T2 until T1 has succeeded, but in that case the sequence number
check is a waste of cycles and memory accesses.
Patricia