On 10/3/2014 10:00 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:
What I'm asking you to consider is a use-case, one that I picked quite
carefully.  Without assuming anything about how the system is architected, if we
have a telephone exchange, and an Error occurs in the handling of a single call,
it seems to me fairly unarguable that it's essential to avoid this bringing down
everyone else's call with it.  That's not simply a matter of convenience -- it's
a matter of safety, because those calls might include emergency calls, urgent
business communications, or any number of other circumstances where dropping
someone's call might have severe negative consequences.

What you're doing is attempting to write a program with the requirement that the program cannot fail.

It's impossible.

If that's your requirement, the system needs to be redesigned so that it can accommodate the failure of the program.

(Ignoring bugs in the program is not accommodating failure, it's pretending that the program cannot fail.)


As I'm sure you realize, I also picked that particular use-case because it's one
where there is a well-known technological solution -- Erlang -- which has as a
key feature its ability to isolate different parts of the program, and to deal
with errors by bringing down the local process where the error occurred, rather
than the whole system.  This is an approach which is seriously battle-tested in
production.

As I (and Brad) has stated before, process isolation, shutting down the failed process, and restarting the process, is acceptable, because processes are isolated from each other.

Threads are not isolated from each other. They are not. Not. Not.


As I said, I'm not asking you to endorse catching Errors in threads, or other
gross simplifications of Erlang's approach.  What I'm interested in are your
thoughts on how we might approach resolving the requirement for this kind of
stability and localization of error-handling with the tools that D provides.

I don't mind if you say to me "That's your problem" (which it certainly is:-),
but I'd like it to be clear that it _is_ a problem, and one that it's important
for D to address, given its strong standing in the development of
super-high-connectivity server applications.

The only way to have super high uptime is to design the system so that failure is isolated, and the failed process can be quickly restarted or replaced. Ignoring bugs is not isolation, and hoping that bugs in one thread doesn't affected memory shared by other threads doesn't work.

Reply via email to