Re: [nodejs] Re: State of the art for request isolation in http servers?

2014-01-16 Thread Bruno Jouhier
Hi Alain,

We use streamline.js. We have developed a big application with it (> 100 k 
lines of streamline souce code). For an example of what a streamline app 
looks like (with exception handling), see 
https://github.com/Sage/streamlinejs/blob/master/tutorial/tutorial.md

Bruno

On Thursday, January 16, 2014 4:57:26 PM UTC+1, Alain Mouette wrote:
>
>  Could you share how you did that?
> I intend to test that kind of thing *before* building a real application, 
> I am new to node (I came from C++) and it may be an interesting learning 
> path (even if a bit difficult)
>
> Thanks
>
> Alain
> === Minha MesaXYZ:  
>  ===
>
> Em 16-01-2014 06:48, Bruno Jouhier escreveu:
>  
> No water-tight technical solution? There are some. Problem is that people 
> are reluctant to go with them. 
>
> For me, the exception handling problem has been solved 3 years ago. Our 
> server is robust. If someone messes up in some kind of obscure piece of 
> code that nobody tested, nothing dramatic will happen. The client of the 
> service will get a 500, the full stacktrace will be logged (with both long 
> and row stacktraces) and the server will go on with the other requests.
>
> Bruno 
>
> On Thursday, January 16, 2014 7:02:17 AM UTC+1, Tomasz Janczuk wrote: 
>>
>>  In general I'm trying to solve the problem of keeping the server 
>>> running despite bad code in a particular request
>>>
>>
>>  Given that no water-tight technical solution to this problem has been 
>> suggested on this thread, perhaps this entire problem should be approached 
>> from a different angle. Instead of trying to *prevent *server failures, 
>> let's *embrace *the fact the server will fail, and for a variety of 
>> reasons (bad user code, memory leak in Node or V8, etc). Then design the 
>> overall system around this assumption. Some ideas to that end:
>>  
>>1. Create a cluster of many more child processes than the number of 
>>CPU cores would suggest to reduce the impact of any one of them failing.  
>>2. Add graceful shutdown logic to reduce the impact of failures you 
>>can actually detect (like a JavaScript level exception), but do not 
>>*prevent* the server from failing. For example, within an uncaught 
>>exception handler shut down the HTTP server and only exit the process 
>> after 
>>requests in flight have completed. But exit it. 
>> 3. Build failure recovery at all levels of the application stack. 
>>For example, retry your ajax requests a number of times. 
>>4. Build planned failures into your code to keep yourself honest as 
>>to the quality of your recovery mechanism. Throw an unhandled exception 
>>within 5-10 minutes of every process start time, by design.   
>>5. Look at your software process to see if the bugs getting into the 
>>production code could be avoided in the first place.  
>>
>>   -- 
> -- 
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: 
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to nod...@googlegroups.com 
> To unsubscribe from this group, send email to
> nodejs+un...@googlegroups.com 
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>  
> --- 
> You received this message because you are subscribed to the Google Groups 
> "nodejs" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to nodejs+un...@googlegroups.com .
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>  

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to nodejs+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [nodejs] Re: State of the art for request isolation in http servers?

2014-01-16 Thread Alain Mouette

Could you share how you did that?
I intend to test that kind of thing *before* building a real 
application, I am new to node (I came from C++) and it may be an 
interesting learning path (even if a bit difficult)


Thanks

Alain
=== Minha MesaXYZ:  ===

Em 16-01-2014 06:48, Bruno Jouhier escreveu:
No water-tight technical solution? There are some. Problem is that 
people are reluctant to go with them.


For me, the exception handling problem has been solved 3 years ago. 
Our server is robust. If someone messes up in some kind of obscure 
piece of code that nobody tested, nothing dramatic will happen. The 
client of the service will get a 500, the full stacktrace will be 
logged (with both long and row stacktraces) and the server will go on 
with the other requests.


Bruno

On Thursday, January 16, 2014 7:02:17 AM UTC+1, Tomasz Janczuk wrote:

In general I'm trying to solve the problem of keeping the
server running despite bad code in a particular request


Given that no water-tight technical solution to this problem has
been suggested on this thread, perhaps this entire problem should
be approached from a different angle. Instead of trying to
/prevent /server failures, let's /embrace /the fact the server
will fail, and for a variety of reasons (bad user code, memory
leak in Node or V8, etc). Then design the overall system around
this assumption. Some ideas to that end:

 1. Create a cluster of many more child processes than the number
of CPU cores would suggest to reduce the impact of any one of
them failing.
 2. Add graceful shutdown logic to reduce the impact of failures
you can actually detect (like a JavaScript level exception),
but do not /prevent/ the server from failing. For example,
within an uncaught exception handler shut down the HTTP server
and only exit the process after requests in flight have
completed. But exit it.
 3. Build failure recovery at all levels of the application stack.
For example, retry your ajax requests a number of times.
 4. Build planned failures into your code to keep yourself honest
as to the quality of your recovery mechanism. Throw an
unhandled exception within 5-10 minutes of every process start
time, by design.
 5. Look at your software process to see if the bugs getting into
the production code could be avoided in the first place.

--
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines

You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

---
You received this message because you are subscribed to the Google 
Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send 
an email to nodejs+unsubscr...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.


--
--
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups "nodejs" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to nodejs+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [nodejs] Re: State of the art for request isolation in http servers?

2014-01-15 Thread Gregg Caines
Thanks for spending the time on this, Forrest  (and everyone else so far as 
well of course!).

that the sane thing to do when your Node process encounters an error is to 
> shut down and restart the process.


I actually agree that uncaught errors should crash the server, but what I'm 
asking for is a way to catch errors that happen within particular sections 
of code (the routes).  Other languages with exceptions allow you to catch 
exceptions at appropriate times other than the two options I feel like I'm 
faced with: right away (so write perfect code, and have perfect data), or 
not at all (resulting in a server restart).  If I know the places that I 
can safely "catch all", do something, then continue on, that should be 
possible -- this is common in a number of applications including most 
(all?) web applications.

Another piece of this is to partition your services such that you don't 
> have the problem of 10,000 clients being at risk if one request crashes. 
> Think about scaling your app horizontally (using cluster or something 
> similar) to keep each process dealing with a smaller number of clients if 
> you can. PHP handles this better (in terms of your problem -- there are 
> always tradeoffs!) because each PHP script is being run in its own context 
> (which for all intents and purposes is a process -- if one PHP handler, it 
> has no effect on the others), which is just a fundamentally different model 
> from Node.


Well the whole reason we pay "the async complexity tax" of node is to get 
10K+ concurrent connections at once.  Many people (with simpler apps) do 
better than that ( 
http://blog.caustik.com/2012/08/19/node-js-w1m-concurrent-connections/ ). 
 So, while we're actually clustering and spread over one or two dozen 
machines, with just one process dying we can still lose 10K+ requests in 
progress with just one restart.  And of course it's common for a user to 
retry a failed request taking down an entire server again.

3. If you want something more strongly biased towards keeping your node 
> processes up and running, Adam Crabtree's trycatch (
> https://github.com/CrabDude/trycatch) takes a similar approach to domains 
> while even more tightly binding code that can fail to an error handler. At 
> this point, domains and trycatch to me feel like similar flavors of the 
> same strategy, but the trycatch API is dead simple and that might appeal to 
> you more than domains, which do have some complexity to them. Adam also has 
> a very different philosophy than the Node core team, where he wants to keep 
> Node processes running and actually recover from errors more often than 
> not, and trycatch reflects that philosophy.


Thanks! I wrote some quick experiments, and trycatch is working nicely and 
seems to basically wrap domains with a simpler interface (IMO)... I think 
we'll give that a go.  Thanks!

G


On Tuesday, January 14, 2014 10:15:10 PM UTC-8, Forrest L Norvell wrote:
>
> As I see it, there are a few paths open to you that don't require you to 
> do a total rewrite using a different framework:
>
> 1. This is pretty much the exact problem that domains were designed to 
> solve. It's up to you to decide whether you want to recover from errors or 
> to shut down the process (gracefully, in a way that lets it finish handling 
> all the requests you have in flight), but they will at least let you put 
> the error-handling closer to where the errors are coming from and allow you 
> to deal with them and not just crash. Domains are still a part of the 
> platform, they're proven at this point, and while they're not perfect, 
> they're a fairly easy incremental step for you.
>
> 2. If you're using Express or Restify, it's pretty easy to write a 
> middleware that will make your middleware chain domains-aware:
>
> var domain = require('domain');
> function domainsify(request, response, next) {
>   var d = domain.create();
>   // ourHandler could stop the HTTP server, preventing any new requests 
> from
>   // being handled while still allowing in-flight requests to finish
>   d.on('error', ourHandler);
>
>   // make sure that any error events on these streams are handled by the 
> domain
>   d.add(request);
>   d.add(response);
>
>   // make sure that asynchronous calls within the scope of the middleware 
> chain are pulled into the domain
>   d.run(next);
> }
>
> 3. If you want something more strongly biased towards keeping your node 
> processes up and running, Adam Crabtree's trycatch (
> https://github.com/CrabDude/trycatch) takes a similar approach to domains 
> while even more tightly binding code that can fail to an error handler. At 
> this point, domains and trycatch to me feel like similar flavors of the 
> same strategy, but the trycatch API is dead simple and that might appeal to 
> you more than domains, which do have some complexity to them. Adam also has 
> a very different philosophy than the Node core team, where he wants to keep 
> Node processes running an

Re: [nodejs] Re: State of the art for request isolation in http servers?

2014-01-15 Thread Alex Kocharin
 When somebody goes to that low important section of setting, he'll just trigger an automatic restart in a few minutes plus a bugreport to developers. That's how domains are supposed to work. Nobody is saying you should crash right away.  15.01.2014, 13:01, "Alexey Petrushin" :> the sane thing to do when your Node process encounters an error is to shut down and restart the process So, let's suppose we create some app, let's say google forum, like this. There are important, frequently used and well tested stuff like showing list of topics and post reply.  And rarely used things like some section deep in the settings screen. We don't care about that section on setting screen - we don't test it very well because who cares if it doesn't works, it's low important stuff anyway. And then - surprise - when someone goes to that low important section of setting - suddenly all the application crashes.On Wednesday, 15 January 2014 10:15:10 UTC+4, Forrest L Norvell wrote:As I see it, there are a few paths open to you that don't require you to do a total rewrite using a different framework: 1. This is pretty much the exact problem that domains were designed to solve. It's up to you to decide whether you want to recover from errors or to shut down the process (gracefully, in a way that lets it finish handling all the requests you have in flight), but they will at least let you put the error-handling closer to where the errors are coming from and allow you to deal with them and not just crash. Domains are still a part of the platform, they're proven at this point, and while they're not perfect, they're a fairly easy incremental step for you. 2. If you're using Express or Restify, it's pretty easy to write a middleware that will make your middleware chain domains-aware: var domain = require('domain');function domainsify(request, response, next) {  var d = domain.create();  // ourHandler could stop the HTTP server, preventing any new requests from  // being handled while still allowing in-flight requests to finish  d.on('error', ourHandler);   // make sure that any error events on these streams are handled by the domain  d.add(request);  d.add(response);   // make sure that asynchronous calls within the scope of the middleware chain are pulled into the domain  d.run(next);} 3. If you want something more strongly biased towards keeping your node processes up and running, Adam Crabtree's trycatch (https://github.com/CrabDude/trycatch) takes a similar approach to domains while even more tightly binding code that can fail to an error handler. At this point, domains and trycatch to me feel like similar flavors of the same strategy, but the trycatch API is dead simple and that might appeal to you more than domains, which do have some complexity to them. Adam also has a very different philosophy than the Node core team, where he wants to keep Node processes running and actually recover from errors more often than not, and trycatch reflects that philosophy. 4. In addition to the control flow alternatives others have mentioned, some people like the way that promises compose for error-handling. I personally find using streams with promises a little awkward, but if your web service has any sort of pipeline (pull some data -> do something with the data -> cache it in e.g. redis -> render a template -> shove it out the response), promises might be a way to DRY up your error-handling in a way that allows you to confine the consequences of exceptions. Also, if you use Bluebird, you probably won't even pay that much of a performance penalty. Just trying to be extra-careful is probably not going to ever feel very satisfying. There's a lot that can go wrong, and as much as the core tries to be consistent about *either* throwing synchronously *or* emitting 'error' events / passing Error objects to callbacks, there are a lot of gotchas that only time, experience, and production crashes will make clear. Domains are core's general solution to this problem, along with the philosophy (that's been articulated here and elsewhere many times) that the sane thing to do when your Node process encounters an error is to shut down and restart the process. Another piece of this is to partition your services such that you don't have the problem of 10,000 clients being at risk if one request crashes. Think about scaling your app horizontally (using cluster or something similar) to keep each process dealing with a smaller number of clients if you can. PHP handles this better (in terms of your problem -- there are always tradeoffs!) because each PHP script is being run in its own context (which for all intents and purposes is a process -- if one PHP handler, it has no effect on the others), which is just a fundamentally different model from Node.  On Tue, Jan 14, 2014 at 9:43 PM, Gregg Caines  wrote:Well even though all the responses so far would require some pretty non-standard solutions (and therefore major changes to our current app), I rea

Re: [nodejs] Re: State of the art for request isolation in http servers?

2014-01-15 Thread Alexey Petrushin
> the sane thing to do when your Node process encounters an error is to 
shut down and restart the process

So, let's suppose we create some app, let's say google forum, like this. 
There are important, frequently used and well tested stuff like showing 
list of topics and post reply. 

And rarely used things like some section deep in the settings screen. We 
don't care about that section on setting screen - we don't test it very 
well because who cares if it doesn't works, it's low important stuff anyway.

And then - surprise - when someone goes to that low important section of 
setting - suddenly all the application crashes.

On Wednesday, 15 January 2014 10:15:10 UTC+4, Forrest L Norvell wrote:
>
> As I see it, there are a few paths open to you that don't require you to 
> do a total rewrite using a different framework:
>
> 1. This is pretty much the exact problem that domains were designed to 
> solve. It's up to you to decide whether you want to recover from errors or 
> to shut down the process (gracefully, in a way that lets it finish handling 
> all the requests you have in flight), but they will at least let you put 
> the error-handling closer to where the errors are coming from and allow you 
> to deal with them and not just crash. Domains are still a part of the 
> platform, they're proven at this point, and while they're not perfect, 
> they're a fairly easy incremental step for you.
>
> 2. If you're using Express or Restify, it's pretty easy to write a 
> middleware that will make your middleware chain domains-aware:
>
> var domain = require('domain');
> function domainsify(request, response, next) {
>   var d = domain.create();
>   // ourHandler could stop the HTTP server, preventing any new requests 
> from
>   // being handled while still allowing in-flight requests to finish
>   d.on('error', ourHandler);
>
>   // make sure that any error events on these streams are handled by the 
> domain
>   d.add(request);
>   d.add(response);
>
>   // make sure that asynchronous calls within the scope of the middleware 
> chain are pulled into the domain
>   d.run(next);
> }
>
> 3. If you want something more strongly biased towards keeping your node 
> processes up and running, Adam Crabtree's trycatch (
> https://github.com/CrabDude/trycatch) takes a similar approach to domains 
> while even more tightly binding code that can fail to an error handler. At 
> this point, domains and trycatch to me feel like similar flavors of the 
> same strategy, but the trycatch API is dead simple and that might appeal to 
> you more than domains, which do have some complexity to them. Adam also has 
> a very different philosophy than the Node core team, where he wants to keep 
> Node processes running and actually recover from errors more often than 
> not, and trycatch reflects that philosophy.
>
> 4. In addition to the control flow alternatives others have mentioned, 
> some people like the way that promises compose for error-handling. I 
> personally find using streams with promises a little awkward, but if your 
> web service has any sort of pipeline (pull some data -> do something with 
> the data -> cache it in e.g. redis -> render a template -> shove it out the 
> response), promises might be a way to DRY up your error-handling in a way 
> that allows you to confine the consequences of exceptions. Also, if you use 
> Bluebird, you probably won't even pay that much of a performance penalty.
>
> Just trying to be extra-careful is probably not going to ever feel very 
> satisfying. There's a lot that can go wrong, and as much as the core tries 
> to be consistent about *either* throwing synchronously *or* emitting 
> 'error' events / passing Error objects to callbacks, there are a lot of 
> gotchas that only time, experience, and production crashes will make clear. 
> Domains are core's general solution to this problem, along with the 
> philosophy (that's been articulated here and elsewhere many times) that the 
> sane thing to do when your Node process encounters an error is to shut down 
> and restart the process.
>
> Another piece of this is to partition your services such that you don't 
> have the problem of 10,000 clients being at risk if one request crashes. 
> Think about scaling your app horizontally (using cluster or something 
> similar) to keep each process dealing with a smaller number of clients if 
> you can. PHP handles this better (in terms of your problem -- there are 
> always tradeoffs!) because each PHP script is being run in its own context 
> (which for all intents and purposes is a process -- if one PHP handler, it 
> has no effect on the others), which is just a fundamentally different model 
> from Node.
>
>  
> On Tue, Jan 14, 2014 at 9:43 PM, Gregg Caines 
> > wrote:
>
>> Well even though all the responses so far would require some pretty 
>> non-standard solutions (and therefore major changes to our current app), I 
>> really do appreciate them.  We have logging, metrics and alerts

Re: [nodejs] Re: State of the art for request isolation in http servers?

2014-01-14 Thread Alex Kocharin
 You can use this module: https://github.com/CrabDude/trycatch It won't require any major changes to the app. I think this is exactly what are you looking for.  As for better general solution, I'd second @tj on that, generators are a nice idea. I didn't try koa, but looks promising. Also, this article - http://spion.github.io/posts/why-i-am-switching-to-promises.html - can be useful and almost made me change the way I do that, but not completely. :)  15.01.2014, 09:43, "Gregg Caines" :Well even though all the responses so far would require some pretty non-standard solutions (and therefore major changes to our current app), I really do appreciate them.  We have logging, metrics and alerts on server restarts, so we know about and fix restarts as fast as possible I believe, but losing 10,000+ user requests at once (per server!  and we have dozens of servers running!) due to one bad api endpoint is just not worth the risk of running like this anymore.  I'm definitely forced to consider the weirder solutions if there isn't a standard one. There have got to be others working on a standard yet somewhat large deployment that have similar concerns though.  How is everyone else managing this?   (And if your answer is "Be more careful", I'm going to assume you're not in the same situation.  Also: we've got a staging environment we test in first and nearly 100% test coverage  ) G On Tuesday, January 14, 2014 7:40:51 PM UTC-8, tjholowaychuk wrote:check out Koa http://koajs.com/ you won't get separate stacks like you do with node-fibers but similar otherwise (built with generators)On Tuesday, 14 January 2014 12:28:52 UTC-8, Gregg Caines wrote:Hey all... I'm wondering if anyone can point me to the current best-practice for isolating requests in a web app.  In general I'm trying to solve the problem of keeping the server running despite bad code in a particular request.  Are domains my only shot?  Do they completely solve it?  Does anyone have existing code?I'm on a somewhat large team, working on a somewhat large codebase, and until now I've been just logging restarts and combing logs for these types of errors, then fixing them (which I'll always do), but I'm starting to feel a bit silly with PHP having solved this 10 years ago.  ;)  When a bug does get through, it would be nice to not lose the whole server and the possible 10,000+ customer requests attached to it, while I scramble to fix it.Thanks for any ideas or pointers!G --  --  Job Board: http://jobs.nodejs.org/ Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines You received this message because you are subscribed to the Google Groups "nodejs" group. To post to this group, send email to nodejs@googlegroups.com To unsubscribe from this group, send email to nodejs+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/nodejs?hl=en?hl=en   ---  You received this message because you are subscribed to the Google Groups "nodejs" group. To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.



-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en
 
--- 
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [nodejs] Re: State of the art for request isolation in http servers?

2014-01-14 Thread Forrest L Norvell
As I see it, there are a few paths open to you that don't require you to do
a total rewrite using a different framework:

1. This is pretty much the exact problem that domains were designed to
solve. It's up to you to decide whether you want to recover from errors or
to shut down the process (gracefully, in a way that lets it finish handling
all the requests you have in flight), but they will at least let you put
the error-handling closer to where the errors are coming from and allow you
to deal with them and not just crash. Domains are still a part of the
platform, they're proven at this point, and while they're not perfect,
they're a fairly easy incremental step for you.

2. If you're using Express or Restify, it's pretty easy to write a
middleware that will make your middleware chain domains-aware:

var domain = require('domain');
function domainsify(request, response, next) {
  var d = domain.create();
  // ourHandler could stop the HTTP server, preventing any new requests from
  // being handled while still allowing in-flight requests to finish
  d.on('error', ourHandler);

  // make sure that any error events on these streams are handled by the
domain
  d.add(request);
  d.add(response);

  // make sure that asynchronous calls within the scope of the middleware
chain are pulled into the domain
  d.run(next);
}

3. If you want something more strongly biased towards keeping your node
processes up and running, Adam Crabtree's trycatch (
https://github.com/CrabDude/trycatch) takes a similar approach to domains
while even more tightly binding code that can fail to an error handler. At
this point, domains and trycatch to me feel like similar flavors of the
same strategy, but the trycatch API is dead simple and that might appeal to
you more than domains, which do have some complexity to them. Adam also has
a very different philosophy than the Node core team, where he wants to keep
Node processes running and actually recover from errors more often than
not, and trycatch reflects that philosophy.

4. In addition to the control flow alternatives others have mentioned, some
people like the way that promises compose for error-handling. I personally
find using streams with promises a little awkward, but if your web service
has any sort of pipeline (pull some data -> do something with the data ->
cache it in e.g. redis -> render a template -> shove it out the response),
promises might be a way to DRY up your error-handling in a way that allows
you to confine the consequences of exceptions. Also, if you use Bluebird,
you probably won't even pay that much of a performance penalty.

Just trying to be extra-careful is probably not going to ever feel very
satisfying. There's a lot that can go wrong, and as much as the core tries
to be consistent about *either* throwing synchronously *or* emitting
'error' events / passing Error objects to callbacks, there are a lot of
gotchas that only time, experience, and production crashes will make clear.
Domains are core's general solution to this problem, along with the
philosophy (that's been articulated here and elsewhere many times) that the
sane thing to do when your Node process encounters an error is to shut down
and restart the process.

Another piece of this is to partition your services such that you don't
have the problem of 10,000 clients being at risk if one request crashes.
Think about scaling your app horizontally (using cluster or something
similar) to keep each process dealing with a smaller number of clients if
you can. PHP handles this better (in terms of your problem -- there are
always tradeoffs!) because each PHP script is being run in its own context
(which for all intents and purposes is a process -- if one PHP handler, it
has no effect on the others), which is just a fundamentally different model
from Node.


On Tue, Jan 14, 2014 at 9:43 PM, Gregg Caines  wrote:

> Well even though all the responses so far would require some pretty
> non-standard solutions (and therefore major changes to our current app), I
> really do appreciate them.  We have logging, metrics and alerts on server
> restarts, so we know about and fix restarts as fast as possible I believe,
> but losing 10,000+ user requests at once (per server!  and we have dozens
> of servers running!) due to one bad api endpoint is just not worth the risk
> of running like this anymore.  I'm definitely forced to consider the
> weirder solutions if there isn't a standard one.
>
> There have got to be others working on a standard yet somewhat large
> deployment that have similar concerns though.  How is everyone else
> managing this?   (And if your answer is "Be more careful", I'm going to
> assume you're not in the same situation.  Also: we've got a staging
> environment we test in first and nearly 100% test coverage  )
>
> G
>
>
> On Tuesday, January 14, 2014 7:40:51 PM UTC-8, tjholowaychuk wrote:
>>
>> check out Koa http://koajs.com/ you won't get separate stacks like you
>> do with node-fibers but simi