On Fri, Jun 1, 2012 at 12:19 PM, Isaac Schlueter <[email protected]> wrote:
> First of all, Mikeal Rogers is not a "core guy who doesn't write > apps". Mikeal Rogers has written several apps with node, and has > authored very few patches to node core. He has a startup, and has > been building applications with Node since the 0.1 days. > Being a core guy has nothing to do with how many patches you have in core or whether you also build apps. Felix builds apps. I have patches in core but also build apps. Being a "core guy" means you have a tremendous amount of influence on the way things go. If Mikeal did not agree with this change, we would likely not be talking about it. Not saying you wouldn't want to fix the core problem, but if Mikeal wanted to protect nextTick for whatever reason, then changing it wouldn't be on the table. As opposed to the rest of us who have to convince you not to change it, once you've already decided it should be. That's what's happening here. > > And yes, I work on the node core project, but I'm also a Joyent > employee. My thoughts on this came from debugging node in production > applications at scale, especially the memory leaks and HTTP errors > that Voxer has seen on a regular basis. Also, some of these problems > have manifested in the node programs that Joyent uses to run data > centers for real customers, pushing loads of network traffic. > > Our complicated nextTick semantics are causing real problems for real > applications. There is no ivory-tower-ism here. Node exists in the > real world. That's why we need to make nextTick work properly for the > cases where it's actually used. Complying with some notion of the > "correct" meaning of "tick" and "next" is completely not a priority. > Bullshit. Yes the tick semantics are complicated. But I think what you've seen is that lots of people have built of their own mental model of how the event loop works and use that mental model to good effect. These mental models may be flawed but they are important. Reasoning about the async nature of node is essential to getting it right. The reason this is even a problem is at least partially due to getting this wrong in the first place. I agree that this is about real applications, and I don't like the people are jumping to conclusions. I'm referring to the meaning of "tick" and how important it is to people understanding the "recommended" way of writing proper node programs. Naming and consistency is a big part of building this understanding. Please don't ignore this. > I didn't ask for feedback as a joke. I asked for feedback because I > wanted to know what problems we would encounter if we changed nextTick > so that it actually works for its intended use case. The conclusion > that I eventually drew (the plan for node v0.9) is informed by that > feedback, and was not known at the outset. > > There were two use cases presented in objection to the proposal: > 1. Using process.nextTick to "break up" CPU intensive operations. > 2. Using process.nextTick as an idle listener. > > #2 is a valid use case for which there is no reasonable API at the > moment, but nextTick is *not* an idle listener anyway, since it will > frequently be fired in advance of pending IO. setTimeout(fn) is > better, but we probably ought to implement setImmediate or something > similar. That was new information, and is very useful. Thank you, > everyone. > > #1 is not a valid use case. nextTick is horrible for this, I just > can't put it any other way. Either the operations are fast enough to > be done in a single thread, and a loop is fine, or they're not, and > you need to put them into a child process. > This is where you're hubris is showing. Making this kind of presumption requires you to dictate the constraints of another person's system. Maybe your dataset doesn't stream. Maybe you can't use redis or memcache for some reason. You keep using the wrong word to describe #1. It IS a valid use case. It's just that you don't recommend it. And you are completely ignoring the fact that it only became not recommended very recently, when in fact we have been actively pushing people to next tick to defer execution for a long time now. Now you've decided that it's not only "not recommended" but a "bad idea". This is the height of presumption. > > Meanwhile, we have lots of real world cases where nextTick is causing > actual problems for real applications. It's adding latency and > causing errors to be thrown. Phone conversations are interrupted by > it, and error pages are showing up in web browsers and applications. > It's an insidious bug that only manifests under load, and it must be > fixed. > > Also, it seems that the documentation of process.nextTick needs to be > improved along with this change, because there is widespread > misunderstanding about how it ought to be used. The impression I'm > getting is that a lot of people treat nextTick a bit like a background > fork, and it's really not a good idea to use it that way. > Yes, the reason there is a widespread misunderstanding is because it was not misunderstood until a few days ago. And to be clear nextTick not being used as "background fork". It's being used an efficient way to kill the current stack to yield ongoing execution and let the event loop continue. It's not about explicitly supporting "CPU intensive operations". For most people it has always been a good practice before calling a callback if you can't be sure if it's async. This has been a best practice of node for as long as I can remember. yeilding to the event loop periodically protects your throughput. Consider functions that check a cache an in memory cache and return without doing i/o. You don't want to have the callback by synchronous, so you throw it into a nextTick call. That was literally a best practice until 2 days ago. Now it's a terrible idea. Because if you r system ends up doing this often, you are negatively affecting your i/o throughput because you're not actually yielding to pending i/o. In practice, I don't think this will happen often because most people are doing real i/o. But this IMO, the potential for this biting people is just as likely as what is biting people right now at high load. In fact, it's more insidious because people's programs might not fail. They'll just see their throughput profiles change and not really be sure why. > Are there other new use cases not already brought up in this thread? > If so, please share them. They may make a big difference. > > For those who feel disenfranchised by this decision, please consider > where I'm coming from. Half the people in node assumed that it worked > one way, and the other half believed it worked the other way, and in > fact, both were at least partly wrong. It's failing at its intended > purpose, and causing http to be subtly broken in some cases, which is > causing real world problems. > We really would've tried to consider where you were coming from if you and Mikeal had not started telling everyone how their applications were bad because they dared to actually use nextTick in a context you don't approve of (again a seemingly recent development). As for how broken nextTick is, I think you're still missing the point of why people feel disenfranchised. nextTick is behaving badly for some people. And it's the people you are closest to. Your friends and acquaintances who are building node apps, joyent customers who have huge node deployments, or whatever. That's all fine and it's great that you are trying to address their concerns. But as I said before, to say nextTick is "broken" is to ignore everyone else who's telling you that it's not and that we are using it just fine. You are confusing the importance of your problem for the importance of the solution you've chosen. And my feeling is that amount of code that will have to change because of this will be far greater than the amount that will change if we provided a different solution to the people who actually experience the data missing problem. > > We can make node much more reliable for high-traffic HTTP servers > (which is what node is explicitly for), at the expense of making it > slightly worse at some approaches to high-CPU use cases (which is what > the child process API is for), and make node's internals simpler in > the process. > > We can't please everyone. But please don't be so presumptuous to > assume that this is somehow about library authors being out of touch > with real world application developers. This is about choosing which > real world application developers to please, and which developers are > going to have to change their code. > This is a very reasonable statement. You are making a hard decision. And you are seeing the consequences of it. I hope it works out for you. I have put up with a lot of changes in node that I didn't agree with and I suspect that most people will get over this one as well. But it seems like you guys also want warm and fuzzies every time you make these hard decisions. That's probably not going to happen. :Marco -- Marco Rogers [email protected] | https://twitter.com/polotek Life is ten percent what happens to you and ninety percent how you respond to it. - Lou Holtz
