Patricia Shanahan wrote:
Greg Trasuk wrote:
On Tue, 2010-07-13 at 13:03, Patricia Shanahan wrote:

Several customers have, over the years, beat into my head the understanding that a crash is better than a wrong answer. That is particularly the case for a redundant service.


A koan to lead you to Jini zen: "If a service crashes, but its client
does not get a reply, did it really crash?"

Or, as I've said in other places, "It's a network, deal with it".  Jini
is all about graceful recovery, not crash prevention. Fallacy 1 - "The
network is reliable".

Indeed. Given that unreliability, and "The network is the computer", an
effectively reliable application has to be able to deal with a server
JVM crash. On the other hand, it is reasonably easy to ensure that a
network has a low probability of producing a completely wrong answer.

I particularly dislike bugs that make things a bit flaky without
happening often enough to be easy to reproduce or nail down. It can be
very hard to get them fixed.

Failing sooner, rather than later, sounds like the right answer.


Meanwhile, I have a question about a related issue. If a Task runAfter method throws anything it is logged and TaskManager acts as though runAfter had returned false, indicating no dependencies.

That is the high performance but high functional risk option. It may create low frequency, hard to reproduce, timing bugs, my least favorite type of bug.

An alternative would be to make the task wait for completion of all older tasks. That is worse performance but much safer.

Thoughts?


Not sure.  Might there be a deadlock issue?  Probably not, but you'd
have to take a look at TaskManager and its dependents.

I would rather produce a deadlock than a wrong answer. The dangerous
case is if a Task's run method issues, directly or indirectly, another
task to the same TaskManager and waits for it. However, the current code
for the case in which there are dependencies can lead to a situation in
which there is only one thread, regardless of the TaskManager
constructor parameters. That would have the same effect, because all
tasks have to wait for that thread to get around to them.

Yes why not wait for completion of all older tasks? Then by logging it, we can see how often it occurs in practise.



It's probably worth noting that since it's part of "com.sun.jini", one
could likely view TaskManager as an implementation detail for the other
com.sun.jini classes (e.g. the Reggie implementations).  In other words,
you don't need to design for general functionality, just usage scenarios
within River itself.

There are dozens of uses, so working out what they all want is
difficult. I believe most of them are effectively orphans, in the sense
of having no river-dev participant who knows about their needs.

One issue to consider. The Task implementations that do not play
dependencies at all have *very* simple runAfter code. A method that
simply unconditionally returns false has a very low probability of
throwing anything. If we do get an exception or error from runAfter it
is almost certainly a non-trivial runAfter method and a task than can
need to wait for an older task.

Patricia

Makes sense.

Cheers,

Peter.

Reply via email to