Re: [nodejs] Re: State of the art for request isolation in http servers?

Gregg Caines Wed, 15 Jan 2014 06:33:09 -0800

Thanks for spending the time on this, Forrest  (and everyone else so far as 
well of course!).


that the sane thing to do when your Node process encounters an error is to 
> shut down and restart the process.


I actually agree that uncaught errors should crash the server, but what I'm 
asking for is a way to catch errors that happen within particular sections 
of code (the routes).  Other languages with exceptions allow you to catch 
exceptions at appropriate times other than the two options I feel like I'm 
faced with: right away (so write perfect code, and have perfect data), or 
not at all (resulting in a server restart).  If I know the places that I 
can safely "catch all", do something, then continue on, that should be 
possible -- this is common in a number of applications including most 
(all?) web applications.

Another piece of this is to partition your services such that you don't 
> have the problem of 10,000 clients being at risk if one request crashes. 
> Think about scaling your app horizontally (using cluster or something 
> similar) to keep each process dealing with a smaller number of clients if 
> you can. PHP handles this better (in terms of your problem -- there are 
> always tradeoffs!) because each PHP script is being run in its own context 
> (which for all intents and purposes is a process -- if one PHP handler, it 
> has no effect on the others), which is just a fundamentally different model 
> from Node.


Well the whole reason we pay "the async complexity tax" of node is to get 
10K+ concurrent connections at once.  Many people (with simpler apps) do 
better than that ( 
http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/ ). 
 So, while we're actually clustering and spread over one or two dozen 
machines, with just one process dying we can still lose 10K+ requests in 
progress with just one restart.  And of course it's common for a user to 
retry a failed request taking down an entire server again.

3. If you want something more strongly biased towards keeping your node 
> processes up and running, Adam Crabtree's trycatch (
> https://github.com/CrabDude/trycatch) takes a similar approach to domains 
> while even more tightly binding code that can fail to an error handler. At 
> this point, domains and trycatch to me feel like similar flavors of the 
> same strategy, but the trycatch API is dead simple and that might appeal to 
> you more than domains, which do have some complexity to them. Adam also has 
> a very different philosophy than the Node core team, where he wants to keep 
> Node processes running and actually recover from errors more often than 
> not, and trycatch reflects that philosophy.


Thanks! I wrote some quick experiments, and trycatch is working nicely and 
seems to basically wrap domains with a simpler interface (IMO)... I think 
we'll give that a go.  Thanks!

G

    
On Tuesday, January 14, 2014 10:15:10 PM UTC-8, Forrest L Norvell wrote:
>
> As I see it, there are a few paths open to you that don't require you to 
> do a total rewrite using a different framework:
>
> 1. This is pretty much the exact problem that domains were designed to 
> solve. It's up to you to decide whether you want to recover from errors or 
> to shut down the process (gracefully, in a way that lets it finish handling 
> all the requests you have in flight), but they will at least let you put 
> the error-handling closer to where the errors are coming from and allow you 
> to deal with them and not just crash. Domains are still a part of the 
> platform, they're proven at this point, and while they're not perfect, 
> they're a fairly easy incremental step for you.
>
> 2. If you're using Express or Restify, it's pretty easy to write a 
> middleware that will make your middleware chain domains-aware:
>
> var domain = require('domain');
> function domainsify(request, response, next) {
>   var d = domain.create();
>   // ourHandler could stop the HTTP server, preventing any new requests 
> from
>   // being handled while still allowing in-flight requests to finish
>   d.on('error', ourHandler);
>
>   // make sure that any error events on these streams are handled by the 
> domain
>   d.add(request);
>   d.add(response);
>
>   // make sure that asynchronous calls within the scope of the middleware 
> chain are pulled into the domain
>   d.run(next);
> }
>
> 3. If you want something more strongly biased towards keeping your node 
> processes up and running, Adam Crabtree's trycatch (
> https://github.com/CrabDude/trycatch) takes a similar approach to domains 
> while even more tightly binding code that can fail to an error handler. At 
> this point, domains and trycatch to me feel like similar flavors of the 
> same strategy, but the trycatch API is dead simple and that might appeal to 
> you more than domains, which do have some complexity to them. Adam also has 
> a very different philosophy than the Node core team, where he wants to keep 
> Node processes running and actually recover from errors more often than 
> not, and trycatch reflects that philosophy.
>
> 4. In addition to the control flow alternatives others have mentioned, 
> some people like the way that promises compose for error-handling. I 
> personally find using streams with promises a little awkward, but if your 
> web service has any sort of pipeline (pull some data -> do something with 
> the data -> cache it in e.g. redis -> render a template -> shove it out the 
> response), promises might be a way to DRY up your error-handling in a way 
> that allows you to confine the consequences of exceptions. Also, if you use 
> Bluebird, you probably won't even pay that much of a performance penalty.
>
> Just trying to be extra-careful is probably not going to ever feel very 
> satisfying. There's a lot that can go wrong, and as much as the core tries 
> to be consistent about *either* throwing synchronously *or* emitting 
> 'error' events / passing Error objects to callbacks, there are a lot of 
> gotchas that only time, experience, and production crashes will make clear. 
> Domains are core's general solution to this problem, along with the 
> philosophy (that's been articulated here and elsewhere many times) that the 
> sane thing to do when your Node process encounters an error is to shut down 
> and restart the process.
>
> Another piece of this is to partition your services such that you don't 
> have the problem of 10,000 clients being at risk if one request crashes. 
> Think about scaling your app horizontally (using cluster or something 
> similar) to keep each process dealing with a smaller number of clients if 
> you can. PHP handles this better (in terms of your problem -- there are 
> always tradeoffs!) because each PHP script is being run in its own context 
> (which for all intents and purposes is a process -- if one PHP handler, it 
> has no effect on the others), which is just a fundamentally different model 
> from Node.
>
>  
> On Tue, Jan 14, 2014 at 9:43 PM, Gregg Caines <cai...@gmail.com<javascript:>
> > wrote:
>
>> Well even though all the responses so far would require some pretty 
>> non-standard solutions (and therefore major changes to our current app), I 
>> really do appreciate them.  We have logging, metrics and alerts on server 
>> restarts, so we know about and fix restarts as fast as possible I believe, 
>> but losing 10,000+ user requests at once (per server!  and we have dozens 
>> of servers running!) due to one bad api endpoint is just not worth the risk 
>> of running like this anymore.  I'm definitely forced to consider the 
>> weirder solutions if there isn't a standard one.
>>
>> There have got to be others working on a standard yet somewhat large 
>> deployment that have similar concerns though.  How is everyone else 
>> managing this?   (And if your answer is "Be more careful", I'm going to 
>> assume you're not in the same situation.  Also: we've got a staging 
>> environment we test in first and nearly 100% test coverage  )
>>
>> G 
>>
>>
>> On Tuesday, January 14, 2014 7:40:51 PM UTC-8, tjholowaychuk wrote:
>>>
>>> check out Koa http://koajs.com/ you won't get separate stacks like you 
>>> do with node-fibers but similar otherwise (built with generators)
>>>
>>> On Tuesday, 14 January 2014 12:28:52 UTC-8, Gregg Caines wrote:
>>>>
>>>> Hey all... I'm wondering if anyone can point me to the current 
>>>> best-practice for isolating requests in a web app.  In general I'm trying 
>>>> to solve the problem of keeping the server running despite bad code in a 
>>>> particular request.  Are domains my only shot?  Do they completely solve 
>>>> it?  Does anyone have existing code?
>>>>
>>>> I'm on a somewhat large team, working on a somewhat large codebase, and 
>>>> until now I've been just logging restarts and combing logs for these types 
>>>> of errors, then fixing them (which I'll always do), but I'm starting to 
>>>> feel a bit silly with PHP having solved this 10 years ago.  ;)  When a bug 
>>>> does get through, it would be nice to not lose the whole server and the 
>>>> possible 10,000+ customer requests attached to it, while I scramble to fix 
>>>> it.
>>>>
>>>> Thanks for any ideas or pointers!
>>>>
>>>> G
>>>>
>>>  -- 
>> -- 
>> Job Board: http://jobs.nodejs.org/
>> Posting guidelines: 
>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>> You received this message because you are subscribed to the Google
>> Groups "nodejs" group.
>> To post to this group, send email to nod...@googlegroups.com<javascript:>
>> To unsubscribe from this group, send email to
>> nodejs+un...@googlegroups.com <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "nodejs" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to nodejs+un...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to nodejs+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [nodejs] Re: State of the art for request isolation in http servers?

Reply via email to