On Monday, 6 January 2014 at 04:16:56 UTC, H. S. Teoh wrote:
Since a null pointer implies that there's some kind of logic error in the code, how much confidence do you have that the other 99 concurrent
requests aren't being wrongly processed too?

That doesn't matter if the service isn't critical, it only matters if it destructively writes to a database. You can also shut down parts of the service rather than the entire service.

Based on this, I'm inclined to say that if a web request process
encountered a NULL pointer, it's probably better to just reset back to a known-good state by restarting.

I many cases it might be, but it should be up to the project management or the organization to set the policy, not the language designer. This is an issue I have with many of the "c++ wannabe languages". They enforce policies that shouldn't be done on the level of a tool (it could be a compiler option though). My pet peeve is Go and its banning of assert() because many programmers use it in an appropriate manner. In D you have the overloading of conditionals and others. With Ada and Rust, it is ok, because they exist to enforce a policy for existing organizations (DoD, Mozilla). Generic programming languages that claim should be more adaptable.

No, usually you'd set things up so that if the webserver goes down, an init script would restart it. Restarting is preferable, because it resets the program back to a known-good state.

The program might be written in such a way that you know that it is a good state when you catch the null exception.

careless bug, but a symptom of somebody attempting to inject a root
exploit?  Blindly continuing will only play into the hand of the
attacker.

Protection against root exploits should be done on lower level (jail).

The thing is, a null pointer error isn't just an exceptional condition caused by bad user data; it's a *logic* error in the code. It's a sign
that something is wrong with the program logic.

And so is array-out-of bounds, or division-by-zero.

Tell the client not to do that again? *That* sounds like the formula for a DoS vector (a rogue client deliberately sending the crashing request
over and over again).

What else can you do? You return an error and block subsequent requests if appropriate.

In a networked computer game you log misbehaviour, you drop the client after a random delay and you can block the offender. What you do not want is to disable the entire service. It is better to run a somewhat faulty service that entertain and retain your customers than shutting down until a bug fix appears. If it takes 15-30 seconds to bring the server back up then you cannot afford to reset all the time.

I can point to many launches of online computer games that has resulted in massive losses due to servers going down during the first few weeks. That is actually one good reason to not use C++ in game servers, the lack of robustness to failure. In some domains the ability to keep the service running, and the ability to turn off parts of the service, is more important than correctness. What you want is a log of player-resources so that you post-failure can restore game balance.

data and start over. This is a case of a problem with the *code*, which
means you cannot trust the program will continue doing what you

That depends on how the program is written and in which area the null exception happend. It might even be a known bug that might take a long time to locate and fix, but that is known to be innocent.

things will still work the way you think they work, will only lead to your program running the exploit code that has been injected into the
corrupted stack.

Pages with execution bit set should be write protected. You can only jump into existing code, injection of code isn't really possible. So if the existing code is unknown to the attacker that attack vector is weak.

The safest recourse is to reset the program back to a known state.

I see no problem with trapping None-failures in pure Python and keeping the service running. The places where it can happen tend to be when you are looking up a non-existing object in a database. Quite innocent if you can backtrack all the way down to the request handler and return an appropriate status code.

If you use the safe subset of D, why should it be different?

Reply via email to