Re: [node-dev] Re: process.nextTick semantics

Mikeal Rogers Fri, 01 Jun 2012 13:46:13 -0700

On Jun 1, 2012, at June 1, 20121:09 PM, Marco Rogers wrote:

> 
> 
> On Fri, Jun 1, 2012 at 12:19 PM, Isaac Schlueter <[email protected]> wrote:
> First of all, Mikeal Rogers is not a "core guy who doesn't write
> apps".  Mikeal Rogers has written several apps with node, and has
> authored very few patches to node core.  He has a startup, and has
> been building applications with Node since the 0.1 days.
> 
> Being a core guy has nothing to do with how many patches you have in core or 
> whether you also build apps. Felix builds apps. I have patches in core but 
> also build apps. Being a "core guy" means you have a tremendous amount of 
> influence on the way things go. If Mikeal did not agree with this change, we 
> would likely not be talking about it. Not saying you wouldn't want to fix the 
> core problem, but if Mikeal wanted to protect nextTick for whatever reason, 
> then changing it wouldn't be on the table. As opposed to the rest of us who 
> have to convince you not to change it, once you've already decided it should 
> be. That's what's happening here.


Several changes have gone in to node, in areas I have more influence than this, 
that I strongly disagreed with. The biggest being the removal of "pause" and 
"resume" events from Stream.pipe().

I doubt that, if I were so inclined, I could torpedo this change.

>  
> 
> And yes, I work on the node core project, but I'm also a Joyent
> employee.  My thoughts on this came from debugging node in production
> applications at scale, especially the memory leaks and HTTP errors
> that Voxer has seen on a regular basis.  Also, some of these problems
> have manifested in the node programs that Joyent uses to run data
> centers for real customers, pushing loads of network traffic.
> 
> Our complicated nextTick semantics are causing real problems for real
> applications.  There is no ivory-tower-ism here.  Node exists in the
> real world.  That's why we need to make nextTick work properly for the
> cases where it's actually used.  Complying with some notion of the
> "correct" meaning of "tick" and "next" is completely not a priority.
> 
> Bullshit. Yes the tick semantics are complicated. But I think what you've 
> seen is that lots of people have built of their own mental model of how the 
> event loop works and use that mental model to good effect. These mental 
> models may be flawed but they are important. Reasoning about the async nature 
> of node is essential to getting it right. The reason this is even a problem 
> is at least partially due to getting this wrong in the first place.
> 
> I agree that this is about real applications, and I don't like the people are 
> jumping to conclusions. I'm referring to the meaning of "tick" and how 
> important it is to people understanding the "recommended" way of writing 
> proper node programs. Naming and consistency is a big part of building this 
> understanding. Please don't ignore this.

I think what Isaac means is that the way people "think" nextTick() works is 
more important than what we can perceive the meaning of "next" and "tick" to 
mean on their own as words.

I think Isaac did a good job of showing that there are two interpretations of 
the current API that are actually both out of alignment with the current 
implementation. Making it fit one of those is more important than perceiving 
the meaning of "next" and "tick" trying to fit it.

In other words, Isaac is not trying to design a name to fit the API or making 
the API fit the exact meaning of the words, his goal is instead to make it fit 
one of the models people actually have for nextTick(). I believe his suggested 
solution for the other mental model for nextTick() is to add setImmediate.

> 
> 
> I didn't ask for feedback as a joke.  I asked for feedback because I
> wanted to know what problems we would encounter if we changed nextTick
> so that it actually works for its intended use case.  The conclusion
> that I eventually drew (the plan for node v0.9) is informed by that
> feedback, and was not known at the outset.
> 
> There were two use cases presented in objection to the proposal:
> 1. Using process.nextTick to "break up" CPU intensive operations.
> 2. Using process.nextTick as an idle listener.
> 
> #2 is a valid use case for which there is no reasonable API at the
> moment, but nextTick is *not* an idle listener anyway, since it will
> frequently be fired in advance of pending IO.  setTimeout(fn) is
> better, but we probably ought to implement setImmediate or something
> similar.  That was new information, and is very useful.  Thank you,
> everyone.
> 
> #1 is not a valid use case.  nextTick is horrible for this, I just
> can't put it any other way.  Either the operations are fast enough to
> be done in a single thread, and a loop is fine, or they're not, and
> you need to put them into a child process.
> 
> This is where you're hubris is showing. Making this kind of presumption 
> requires you to dictate the constraints of another person's system. Maybe 
> your dataset doesn't stream. Maybe you can't use redis or memcache for some 
> reason. You keep using the wrong word to describe #1. It IS a valid use case. 
> It's just that you don't recommend it. And you are completely ignoring the 
> fact that it only became not recommended very recently, when in fact we have 
> been actively pushing people to next tick to defer execution for a long time 
> now. Now you've decided that it's not only "not recommended" but a "bad 
> idea". This is the height of presumption.

If a compelling enough argument can be made for this use case then I don't 
think we'll fight a solution that fits it. What we are saying is that 
nextTick() does *not* fit it and it's being used for this case at the moment 
because we've failed to provide a better solution.

I'm the one who doubts the validity of the use case entirely, specifically I 
contend that the solution to processing a collection of "indefinite size" 
should not ever be handled by loading the entire thing in to finite memory.

>  
> 
> Meanwhile, we have lots of real world cases where nextTick is causing
> actual problems for real applications.  It's adding latency and
> causing errors to be thrown.  Phone conversations are interrupted by
> it, and error pages are showing up in web browsers and applications.
> It's an insidious bug that only manifests under load, and it must be
> fixed.
> 
> Also, it seems that the documentation of process.nextTick needs to be
> improved along with this change, because there is widespread
> misunderstanding about how it ought to be used.  The impression I'm
> getting is that a lot of people treat nextTick a bit like a background
> fork, and it's really not a good idea to use it that way.
> 
> Yes, the reason there is a widespread misunderstanding is because it was not 
> misunderstood until a few days ago. And to be clear nextTick not being used 
> as "background fork". It's being used an efficient way to kill the current 
> stack to yield ongoing execution and let the event loop continue. It's not 
> about explicitly supporting "CPU intensive operations". For most people it 
> has always been a good practice before calling a callback if you can't be 
> sure if it's async. This has been a best practice of node for as long as I 
> can remember. yeilding to the event loop periodically protects your 
> throughput.
> 
> Consider functions that check a cache an in memory cache and return without 
> doing i/o. You don't want to have the callback by synchronous, so you throw 
> it into a nextTick call. That was literally a best practice until 2 days ago. 
> Now it's a terrible idea.

That use case is solved just as well with the change that Isaac is proposing. 
The only use case that his proposed change steps on is CPU intensive operations 
that might starve IO.

Regardless of peoples' impression of nextTick() your use case here will work as 
well after this change as before it. Technically, it'll be a little faster.

> Because if you r system ends up doing this often, you are negatively 
> affecting your i/o throughput because you're not actually yielding to pending 
> i/o. In practice, I don't think this will happen often because most people 
> are doing real i/o. But this IMO, the potential for this biting people is 
> just as likely as what is biting people right now at high load. In fact, it's 
> more insidious because people's programs might not fail. They'll just see 
> their throughput profiles change and not really be sure why.

Not true. You're talking about adding a single handler, not recursively adding 
handlers forever, and I'm not seeing a case where you could ever do so many of 
these legitimately that aren't on their own CPU intensive that you would starve 
IO.

From what I can tell this use case will work better and won't starve IO unless 
you have a bug where you're recursively adding handlers forever.

If you do add so many handlers recursively that you starve IO then you'll hit 
the guard I already proposed, so it will fail.

> 
> 
> Are there other new use cases not already brought up in this thread?
> If so, please share them.  They may make a big difference.
> 
> For those who feel disenfranchised by this decision, please consider
> where I'm coming from.  Half the people in node assumed that it worked
> one way, and the other half believed it worked the other way, and in
> fact, both were at least partly wrong.  It's failing at its intended
> purpose, and causing http to be subtly broken in some cases, which is
> causing real world problems.
> 
> We really would've tried to consider where you were coming from if you and 
> Mikeal had not started telling everyone how their applications were bad 
> because they dared to actually use nextTick in a context you don't approve of 
> (again a seemingly recent development).

You make it sound like we called all these developers names, we didn't. We said 
"that's not what this is for, that's what this other thing is for." A perfectly 
valid point for a technical argument. The only time I got upset was about the 
benchmark which tested something that wasn't even a real use case.

> 
> As for how broken nextTick is, I think you're still missing the point of why 
> people feel disenfranchised. nextTick is behaving badly for some people. And 
> it's the people you are closest to. Your friends and acquaintances who are 
> building node apps, joyent customers who have huge node deployments, or 
> whatever. That's all fine and it's great that you are trying to address their 
> concerns. But as I said before, to say nextTick is "broken" is to ignore 
> everyone else who's telling you that it's not and that we are using it just 
> fine. You are confusing the importance of your problem for the importance of 
> the solution you've chosen. And my feeling is that amount of code that will 
> have to change because of this will be far greater than the amount that will 
> change if we provided a different solution to the people who actually 
> experience the data missing problem.

It has nothing to do with who people are friends with. It has to do with people 
that are pushing node to it's limits and finding problems. Historically, this 
is how we've solved most of the problems in node. We listen to people that 
spend enough time with node that they discover the rough edges and bugs. That's 
what makes node so good, nobody is happy with node having limitations and the 
people who work on core are motivated to widen the boundaries of what node 
programs can do.

Here is why your proposed solution, of provided a new API for people who have 
enough load to step on this, will fail. Code that is not just in core, but in 
request, filed, and most other stream libraries that use nextTick() as a way to 
process data before IO to check the state of the stream, all fail under load. 
All of them, without a doubt, fail under load because of this issue. Forget 
about all the code you think might need to change after Isaac's proposed 
changes, far more code is broken now under load than will break in the future 
with the change. 

Saying that this is just a problem for "people under load", and they can solve 
this themselves, isn't good enough, that's what Python and Ruby do and we're 
better than that. All node programs and libraries that use APIs in the way they 
are recommended should work under load. If they don't then that is EXACTLY what 
core is suppose to fix.

>  
> 
> We can make node much more reliable for high-traffic HTTP servers
> (which is what node is explicitly for), at the expense of making it
> slightly worse at some approaches to high-CPU use cases (which is what
> the child process API is for), and make node's internals simpler in
> the process.
> 
> We can't please everyone.  But please don't be so presumptuous to
> assume that this is somehow about library authors being out of touch
> with real world application developers.  This is about choosing which
> real world application developers to please, and which developers are
> going to have to change their code.
> 
> This is a very reasonable statement. You are making a hard decision. And you 
> are seeing the consequences of it. I hope it works out for you. I have put up 
> with a lot of changes in node that I didn't agree with and I suspect that 
> most people will get over this one as well. But it seems like you guys also 
> want warm and fuzzies every time you make these hard decisions. That's 
> probably not going to happen.
> 
> :Marco
> 
> 
> -- 
> Marco Rogers
> [email protected] | https://twitter.com/polotek
> 
> Life is ten percent what happens to you and ninety percent how you respond to 
> it.
> - Lou Holtz

Re: [node-dev] Re: process.nextTick semantics

Reply via email to