On Mon, Jan 06, 2014 at 02:24:09AM +0000, digitalmars-d-boun...@puremagic.com wrote: > On Sunday, 5 January 2014 at 15:19:15 UTC, H. S. Teoh wrote: > >Isn't that usually handled by running the webserver itself as a > >separate process, so that when the child segfaults the parent returns > >HTTP 501? > > You can do that. The hard part is how to deal with the other 99 > non-offending concurrent requests running in the faulty process.
Since a null pointer implies that there's some kind of logic error in the code, how much confidence do you have that the other 99 concurrent requests aren't being wrongly processed too? > How does the parent process know which request was the offending, > and what if the parent process was the one failing, then you should > handle it in the front-end-proxy anyway? Usually the sysadmin would set things up so that if the front-end proxy dies, it would be restarted by a script in (hopefully) a clean state. > Worse, cutting off all requests could leave trash around in the > system where requests write to temporary data stores where it is > undesirable to implement a full logging/cross-server transactional > mechanism. That could be a DoS vector. I've had to deal with this issue before at my work (it's not related to webservers, but I think the same principle applies). There's a daemon that has to run an operation to clean up a bunch of auxiliary data after the user initiates the removal of certain database objects. The problem is, some of the cleanup operations are non-trivial, and has possibility of failure (could be an error returned from deep within the cleanup code, or a segfault, or whatever). So I wrote some complex scaffolding code to catch these kinds of problems, and to try to clean things up afterwards. But eventually we found that attempting this sort of error recovery is actually counterproductive, because it made the code more complicated, and added intermediate states: in addition to "object present" and "object deleted", there was now "object partially deleted" -- now all code has to detect this and decide what to do with it. Then customers started seeing the "object partially deleted" state, which was never part of the design of the system, which led to all sorts of odd behaviour (certain operations don't work, the object shows up in some places but not others, etc.). Finally, we decided that it's better to keep the system in simple, well-defined states (only "object present" and "object not present"), even if it comes at the cost of leaving stray unreferenced data lying around from a previous failed cleanup operation. Based on this, I'm inclined to say that if a web request process encountered a NULL pointer, it's probably better to just reset back to a known-good state by restarting. Sure it leaves a bunch of stray data around, but reducing code complexity often outweighs saving wasted space. > >HTTP link? I rather the process segfault immediately rather than > >continuing to run when it detected an obvious logic problem with > >its own code). > > And not start up again, keeping the service down until a bugfix > arrives? No, usually you'd set things up so that if the webserver goes down, an init script would restart it. Restarting is preferable, because it resets the program back to a known-good state. Continuing to barge on when something has obviously gone wrong (null pointer where it's not expected) is risky, because what if that null pointer is not due to a careless bug, but a symptom of somebody attempting to inject a root exploit? Blindly continuing will only play into the hand of the attacker. > A null pointer error can be a innocent bug for some services, so I > don't think the programming language should dictate what you do, > though you probably should have write protected code-pages with > execute flag. The thing is, a null pointer error isn't just an exceptional condition caused by bad user data; it's a *logic* error in the code. It's a sign that something is wrong with the program logic. I don't consider that an "innocent error"; it's a sign that the code can no longer be trusted to do the right thing anymore. So, I'd say it's safer to terminate the program and have the restart script reset the program state back to a known-good initial state. > E.g. I don't think it makes sense to shut down a trivial service > written in "Python" if it has a logic flaw that tries to access a > None pointer for a specific request if you know where in the code it > happens. It makes sense to issue an exception, catch it in the > request handler free all temporary allocated resources and tell the > offending client not to do that again and keep the process running > completing all other requests. Otherwise you have a DoS vector? Tell the client not to do that again? *That* sounds like the formula for a DoS vector (a rogue client deliberately sending the crashing request over and over again). > It should be up to the application programmer whether the program > should recover and complete the other 99 concurrent requests before > resetting, not the language. If one http request can shut down the > other 99 requests in the process then it becomes a DoS vector. I agree with the principle that the programmer should decide what happens, but I think there's a wrong assumption here that the *program* is fit to make this decision after encountering a logic error like an unexpected null pointer. Again, it's not a case of bad user input, where the problem is just with the data and you can just throw away the bad data and start over. This is a case of a problem with the *code*, which means you cannot trust the program will continue doing what you designed it to -- the null pointer proves that the program state *isn't* what you assumed it is, so now you can no longer trust that any subsequent code will actually do what you think it should do. This kind of misplaced assumption is the underlying basis for things like stack corruption exploits: under normal circumstances your function call will simply return to its caller after it finishes, but now, it actually *doesn't* return to the caller. There's no way you can predict where it will go, because the fundamental assumptions about how the stack works no longer hold due to the corruption. Blindly assuming that things will still work the way you think they work, will only lead to your program running the exploit code that has been injected into the corrupted stack. The safest recourse is to reset the program back to a known state. T -- People say I'm arrogant, and I'm proud of it.