Re: [node-dev] Re: process.nextTick semantics

Marco Rogers Fri, 01 Jun 2012 13:09:37 -0700

On Fri, Jun 1, 2012 at 12:19 PM, Isaac Schlueter <[email protected]> wrote:


> First of all, Mikeal Rogers is not a "core guy who doesn't write
> apps".  Mikeal Rogers has written several apps with node, and has
> authored very few patches to node core.  He has a startup, and has
> been building applications with Node since the 0.1 days.
>

Being a core guy has nothing to do with how many patches you have in core
or whether you also build apps. Felix builds apps. I have patches in core
but also build apps. Being a "core guy" means you have a tremendous amount
of influence on the way things go. If Mikeal did not agree with this
change, we would likely not be talking about it. Not saying you wouldn't
want to fix the core problem, but if Mikeal wanted to protect nextTick for
whatever reason, then changing it wouldn't be on the table. As opposed to
the rest of us who have to convince you not to change it, once you've
already decided it should be. That's what's happening here.


>
> And yes, I work on the node core project, but I'm also a Joyent
> employee.  My thoughts on this came from debugging node in production
> applications at scale, especially the memory leaks and HTTP errors
> that Voxer has seen on a regular basis.  Also, some of these problems
> have manifested in the node programs that Joyent uses to run data
> centers for real customers, pushing loads of network traffic.
>
> Our complicated nextTick semantics are causing real problems for real
> applications.  There is no ivory-tower-ism here.  Node exists in the
> real world.  That's why we need to make nextTick work properly for the
> cases where it's actually used.  Complying with some notion of the
> "correct" meaning of "tick" and "next" is completely not a priority.
>

Bullshit. Yes the tick semantics are complicated. But I think what you've
seen is that lots of people have built of their own mental model of how the
event loop works and use that mental model to good effect. These mental
models may be flawed but they are important. Reasoning about the async
nature of node is essential to getting it right. The reason this is even a
problem is at least partially due to getting this wrong in the first place.

I agree that this is about real applications, and I don't like the people
are jumping to conclusions. I'm referring to the meaning of "tick" and how
important it is to people understanding the "recommended" way of writing
proper node programs. Naming and consistency is a big part of building this
understanding. Please don't ignore this.


> I didn't ask for feedback as a joke.  I asked for feedback because I
> wanted to know what problems we would encounter if we changed nextTick
> so that it actually works for its intended use case.  The conclusion
> that I eventually drew (the plan for node v0.9) is informed by that
> feedback, and was not known at the outset.
>
> There were two use cases presented in objection to the proposal:
> 1. Using process.nextTick to "break up" CPU intensive operations.
> 2. Using process.nextTick as an idle listener.
>
> #2 is a valid use case for which there is no reasonable API at the
> moment, but nextTick is *not* an idle listener anyway, since it will
> frequently be fired in advance of pending IO.  setTimeout(fn) is
> better, but we probably ought to implement setImmediate or something
> similar.  That was new information, and is very useful.  Thank you,
> everyone.
>
> #1 is not a valid use case.  nextTick is horrible for this, I just
> can't put it any other way.  Either the operations are fast enough to
> be done in a single thread, and a loop is fine, or they're not, and
> you need to put them into a child process.
>

This is where you're hubris is showing. Making this kind of presumption
requires you to dictate the constraints of another person's system. Maybe
your dataset doesn't stream. Maybe you can't use redis or memcache for some
reason. You keep using the wrong word to describe #1. It IS a valid use
case. It's just that you don't recommend it. And you are completely
ignoring the fact that it only became not recommended very recently, when
in fact we have been actively pushing people to next tick to defer
execution for a long time now. Now you've decided that it's not only "not
recommended" but a "bad idea". This is the height of presumption.


>
> Meanwhile, we have lots of real world cases where nextTick is causing
> actual problems for real applications.  It's adding latency and
> causing errors to be thrown.  Phone conversations are interrupted by
> it, and error pages are showing up in web browsers and applications.
> It's an insidious bug that only manifests under load, and it must be
> fixed.
>
> Also, it seems that the documentation of process.nextTick needs to be
> improved along with this change, because there is widespread
> misunderstanding about how it ought to be used.  The impression I'm
> getting is that a lot of people treat nextTick a bit like a background
> fork, and it's really not a good idea to use it that way.
>

Yes, the reason there is a widespread misunderstanding is because it was
not misunderstood until a few days ago. And to be clear nextTick not being
used as "background fork". It's being used an efficient way to kill the
current stack to yield ongoing execution and let the event loop continue.
It's not about explicitly supporting "CPU intensive operations". For most
people it has always been a good practice before calling a callback if you
can't be sure if it's async. This has been a best practice of node for as
long as I can remember. yeilding to the event loop periodically protects
your throughput.

Consider functions that check a cache an in memory cache and return without
doing i/o. You don't want to have the callback by synchronous, so you throw
it into a nextTick call. That was literally a best practice until 2 days
ago. Now it's a terrible idea. Because if you r system ends up doing this
often, you are negatively affecting your i/o throughput because you're not
actually yielding to pending i/o. In practice, I don't think this will
happen often because most people are doing real i/o. But this IMO, the
potential for this biting people is just as likely as what is biting people
right now at high load. In fact, it's more insidious because people's
programs might not fail. They'll just see their throughput profiles change
and not really be sure why.


> Are there other new use cases not already brought up in this thread?
> If so, please share them.  They may make a big difference.
>
> For those who feel disenfranchised by this decision, please consider
> where I'm coming from.  Half the people in node assumed that it worked
> one way, and the other half believed it worked the other way, and in
> fact, both were at least partly wrong.  It's failing at its intended
> purpose, and causing http to be subtly broken in some cases, which is
> causing real world problems.
>

We really would've tried to consider where you were coming from if you and
Mikeal had not started telling everyone how their applications were bad
because they dared to actually use nextTick in a context you don't approve
of (again a seemingly recent development).

As for how broken nextTick is, I think you're still missing the point of
why people feel disenfranchised. nextTick is behaving badly for some
people. And it's the people you are closest to. Your friends
and acquaintances who are building node apps, joyent customers who have
huge node deployments, or whatever. That's all fine and it's great that you
are trying to address their concerns. But as I said before, to say nextTick
is "broken" is to ignore everyone else who's telling you that it's not and
that we are using it just fine. You are confusing the importance of your
problem for the importance of the solution you've chosen. And my feeling is
that amount of code that will have to change because of this will be far
greater than the amount that will change if we provided a different
solution to the people who actually experience the data missing problem.


>
> We can make node much more reliable for high-traffic HTTP servers
> (which is what node is explicitly for), at the expense of making it
> slightly worse at some approaches to high-CPU use cases (which is what
> the child process API is for), and make node's internals simpler in
> the process.
>
> We can't please everyone.  But please don't be so presumptuous to
> assume that this is somehow about library authors being out of touch
> with real world application developers.  This is about choosing which
> real world application developers to please, and which developers are
> going to have to change their code.
>

This is a very reasonable statement. You are making a hard decision. And
you are seeing the consequences of it. I hope it works out for you. I have
put up with a lot of changes in node that I didn't agree with and I suspect
that most people will get over this one as well. But it seems like you guys
also want warm and fuzzies every time you make these hard decisions. That's
probably not going to happen.

:Marco


-- 
Marco Rogers
[email protected] | https://twitter.com/polotek

Life is ten percent what happens to you and ninety percent how you respond
to it.
- Lou Holtz

Re: [node-dev] Re: process.nextTick semantics

Reply via email to