Re: 3.0, the 2011 thread.

Paul Querna Fri, 17 Jun 2011 09:15:34 -0700

On Wed, Jun 15, 2011 at 4:33 PM, Graham Leggett <minf...@sharp.fm> wrote:
> On 16 Jun 2011, at 12:01 AM, Paul Querna wrote:
>
>> I think we have all joked on and off about 3.0 for... well about 8 years
>> now.
>>
>> I think we are nearing the point we might actually need to be serious
>> about it.
>>
>> The web is changed.
>>
>> SPDY is coming down the pipe pretty quickly.
>>
>> WebSockets might actually be standardized this year.
>>
>> Two protocols which HTTPD is unable to be good at. Ever.
>>
>> The problem is our process model, and our module APIs.
>
> I am not convinced.
>
> Over the last three years, I have developed a low level stream serving
> system that we use to disseminate diagnostic data across datacentres, and
> one of the basic design decisions was that  it was to be lock free and event
> driven, because above all it needed to be fast. The event driven stuff was
> done properly, based on religious application of the following rule:
>
> "Thou shalt not attempt any single read or write without the event loop
> giving you permission to do that single read or write first. Not a single
> attempt, ever."
>
> From that effort I've learned the following:
>
> - Existing APIs in unix and windows really really suck at non blocking
> behaviour. Standard APR file handling couldn't do it, so we couldn't use it
> properly. DNS libraries are really terrible at it. The vast majority of
> "async" DNS libraries are just hidden threads which wrap attempts to make
> blocking calls, which in turn means unknown resource limits are hit when you
> least expect it. Database and LDAP calls are blocking. What this means
> practically is that you can't link to most software out there.



Yes.

Don't use the existing APIs.

Use libuv for IO.

Use c-ares for DNS.

Don't use LDAP and Databases in the Event Loop;  Not all content
generation needs to be in the main event loop, but lots of content
generation and handling of clients should be.

> - You cannot block, ever. Think you can cheat and just make a cheeky attempt
> to load that file quickly while nobody is looking? Your hard disk spins
> down, your network drive is slow for whatever reason, and your entire server
> stops dead in its tracks. We see this choppy behaviour in poorly written
> user interface code, we see the same choppy behaviour in cheating event
> driven webservers.

Node.js doesn't cheat.  It works fine.  Its not that hard to .... not
do file IO in the event loop thread.

> - You have zero room for error. Not a single mistake can be tolerated. One
> foot wrong, the event loop spins. Step one foot wrong the other way, and
> your task you were doing evaporates. Finding these problems is painful, and
> your server is unstable until you do.

This sounds like an implementation problem.  This is not a problem in Node.js.

> - You have to handle every single possible error condition. Every single
> one. Miss one? You suddenly drop out of an event handler, and your event
> loop spins, or the request becomes abandoned. You have no room for error at
> all.

I'm not suggesting the whole thing is trivial, but how is this worse
than our current situation?

> We have made our event driven code work because it does a number of very
> simple and small things, and it's designed to do these simple and small
> things well, and we want it to be as compact and fast as humanly possible,
> given that datacentre footprint is our primary constraint.
>
> Our system is like a sportscar, it's fast, but it breaks down if we break
> the rules. But for us, we are prepared to abide by the rules to achieve the
> speed we need.
>
> Let's contrast this with a web server.
>
> Webservers are traditionally fluid beasts, that have been and continue to be
> moulded and shaped that way through many many ever changing requirements
> from webmasters. They have been made modular and extensible, and those
> modules and extensions are written by people with different programming
> ability, to different levels of tolerances, within very different budget
> constraints.
>
> Simply put, webservers need to tolerate error. They need to be built like
> tractors.
>
> Unreliable code? We have to work despite that. Unhandled error conditions?
> We have to work despite that. Code that was written in a hurry on a budget?
> We have to work despite that.

You are confusing the 'core' network IO model with fault isolation.
The Worker MPM has actually been quite good on most platforms for the
last decade.   There is little reason to use prefork anymore.

Should we run PHP inside the core event loop?  Hell no.

We can build reasonable fault isolation for modules that wish to have
it, probably even do it by default, and if a module 'opts' in, or
maybe there are different APIs, it gets to run in the Event Loop.

> Are we going to be sexy? Of course not. But while the sportscar is broken
> down at the side of the road, the tractor just keeps going.
>
> Why does our incredibly unsexy architecture help webmasters? Because prefork
> is bulletproof. Leak, crash, explode, hang, the parent will clean up after
> us. Whatever we do, within reason, doesn't affect the process next door. If
> things get really dire, we're delayed for a while, and we recover when the
> problems pass. Does the server die? Pretty much never. What if we trust our
> code? Well, worker may help us. Crashes do affect the request next door, but
> if they're rare enough we can tolerate it. The event mpm? It isn't truly an
> even mpm, it is rather more efficient when it comes to keepalives and
> waiting for connections, where we hand this problem to an event loop that
> doesn't run anyone else's code within it, so we're still reliable despite
> the need for a higher standard of code accuracy.

Wait, so you are saying the only valid configuration of httpd is
prefork?  This doesn't match at all how I've been using httpd for the
last 5 years.

> If you've ever been in a situation where a company demands more speed out of
> a webserver, wait until you sacrifice reliability giving them the speed.
> Suddenly they don't care about the speed, reliability becomes top priority
> again, as it should be.
>
> So, to get round to my point. If we decide to relook at the architecture of
> v3.0, we should be careful to ensure that we don't stop offering a "tractor
> mode", as this mode is our killer feature.. There are enough webservers out
> there that try to be event driven and sexy, and then fall over on
> reliability. Or alternatively, there are webservers out there that try to be
> event driven and sexy, and succeed at doing so because they keep their
> feature set modest, keep extensibility to a minimum and avoid touching
> blocking calls to disks and other blocking devices. Great for load
> balancers, not so great for anything else.
>
> Apache httpd has always had at it's heart the ability to be practically
> extensible, while remaining reliable, and I think we should continue to do
> that.

I think as Stefan aludes to, there is a reasonable middle ground where
network IO is done well in an Event loop, but we can still maintain
easy extendability, with some multi-process and multi-thread systems
for content generators that have their own needs, like file io.

But certain things in the core, like SSL, must be done right, and done
in an evented way.  It'll be hard, but we are programmers after all
aren't we?

Re: 3.0, the 2011 thread.

Reply via email to