Re: Kill a request nicely

2011-06-15 Thread Ray Morris
   I am writing an output filter module that will under some
   circumstances want to send back an HTTP Error Code and kill
   the request without sending back the content.
...
  The only way you can affect a response status is to return an
  error to your caller before passing anything down the chain.
...

 So what option do I have? Cache all the buckets of the request without
 passing anything down the chain so I can make the decision about what
 to do? 

Obviously reading in the whole request is asking for trouble, mainly 
a memory bomb when the output you're filtering i slarger than you 
planned. What to do, I think, depends on what your module is trying 
to accomplish. What sort of error might happen in the middle of the 
content?

First, consider whether or not your module is actually a filter. 
That is, does it transform the content from one representation
to another, or is it actually something else, like an access,
authorization or authentication module?  At least in my mind, a 
filter is like mod_deflate or chunking - something that changes 
how the content is represented, but doesn't fundamentally change 
the content itself. If it sometimes changes 200 OK content into 
some error, it may not be an output filter. Secondly, can the error 
be detected earlier with some proactive checking?
-- 
Ray Morris
supp...@bettercgi.com

Strongbox - The next generation in site security:
http://www.bettercgi.com/strongbox/

Throttlebox - Intelligent Bandwidth Control
http://www.bettercgi.com/throttlebox/

Strongbox / Throttlebox affiliate program:
http://www.bettercgi.com/affiliates/user/register.php




On Tue, 14 Jun 2011 21:08:28 -0500
Jason Funk jasonlf...@gmail.com wrote:

 So what option do I have? Cache all the buckets of the request without
 passing anything down the chain so I can make the decision about what
 to do? Would that even require caching them in the filter's ctx?
 
 Do I have any other choice?
 
 On Tue, Jun 14, 2011 at 5:12 PM, Nick Kew n...@apache.org wrote:
 
  On Tue, 14 Jun 2011 16:31:22 -0500
  Jason Funk jasonlf...@gmail.com wrote:
 
   I am writing an output filter module that will under some
   circumstances
  want
   to send back an HTTP Error Code and kill the request without
   sending back the content.
 
  You can't set an HTTP response code when one has already been sent
  down the wire to the client.  It's in the nature of an output
  filter that the work is done in a callback, which (in general)
  only happens after the response has started, and too late to
  set a response code or headers.  Thus the filter callback API
  isn't designed to set a status like a processing hook can
  when it determines the outcome of a request.
 
  The only way you can affect a response status is to return an
  error to your caller before passing anything down the chain.
 
 
  --
  Nick Kew
 
  Available for work, contract or permanent.
  http://www.webthing.com/~nick/cv.html
 



Re: Kill a request nicely

2011-06-15 Thread Joe Lewis
On Wed, 2011-06-15 at 13:11 -0500, Jason Funk wrote:


 
 User Makes Request-  Web Server processes and generates output - My
 module analyzes ouput determines whether it should be passed back to the
 user or not.



Sounds like you have the right one, an output filter.  However, should
it really just delete the content it is checking for, rather than to try
and force an error response to the browser?  Or are you trying to end
with a 403 forbidden?

There should be output filters that can change things BEFORE the headers
are sent (see 
AP_FTYPE_RESOURCE).  PHP is one of those that behave this way.  Remember
though, that sending a bucket brigade on to the next filter may result
in the headings being sent.

If you use an output filter, loop through the buckets (don't flatten
them) to ensure everything is okay, before passing to the next filter.
If not, you can create a new bucket brigade and send that on.

Joe Lewis
-- 
Director - Systems Administration
http://www.silverhawk.net/


Re: mod_lua Filter Hook?

2011-06-15 Thread Joachim Zobel
On Mon, 2011-06-13 at 11:14 -0600, Brian McCallister wrote:
 I'd very much like to support filters in mod_lua, but have not had a
 chance to figure out the best way to do it. Help would be VERY
 appreciated :-) 

mod_perl does this pretty good IMHO. The point is auto generation of
usable API wrappers.

How do the different options for adding filter support look like?

Sincerely,
Joachim






Re: mod_lua Filter Hook?

2011-06-15 Thread Akins, Brian
On 6/15/11 4:08 PM, Joachim Zobel jzo...@heute-morgen.de wrote:

 mod_perl does this pretty good IMHO. The point is auto generation of
 usable API wrappers.

FWIW, and this is just my opinion, but I'm not not 100% sure that having a
complete (or near) complete Lua version of the HTTPD (and APR) API is really
worth the effort.  I've grown to like the very simple way some other web
servers have done it.  I've also learn to write as much code in Lua as
possible and just have the low level glue in C, especially with the jit.

For filters, etc, not sure we really need buckets in Lua.  Maybe just
represent them as a table of buffers or something simple like that.

-- 
Brian Akins




Re: mod_lua Filter Hook?

2011-06-15 Thread Joachim Zobel
On Wed, 2011-06-15 at 17:04 -0400, Akins, Brian wrote:

 For filters, etc, not sure we really need buckets in Lua.  Maybe just
 represent them as a table of buffers or something simple like that.

This misses my (admittedly not so important) use case of sax buckets.
See
http://www.heute-morgen.de/site/03_Web_Tech/50_Building_an_Apache_XML_Rewriting_Stack.shtml

Sincerely,
Joachim





3.0, the 2011 thread.

2011-06-15 Thread Paul Querna
I think we have all joked on and off about 3.0 for... well about 8 years now.

I think we are nearing the point we might actually need to be serious about it.

The web is changed.

SPDY is coming down the pipe pretty quickly.

WebSockets might actually be standardized this year.

Two protocols which HTTPD is unable to be good at. Ever.

The problem is our process model, and our module APIs.

The Event MPM was a valiant effort in some ways, but mod_ssl and other
filters will always block its progress, and with protocols like SPDY,
falling back to Worker MPM behaviors is pointless.

I think there are exciting things happening in C however.

4 projects that maybe could form the baseline for something new.

pocore: For base OS portability and memory pooling system.
  http://code.google.com/p/pocore/
libuv: Portable, fast, Network IO. (IOCP programming model, brought to Unix)
  https://github.com/joyent/libuv
http-parser: HTTP really broken out to simple callbacks.
  https://github.com/ry/http-parser
selene: SSL, redone to better support Async IO.
  https://github.com/pquerna/selene

All of these are young.  Most are incomplete.

But they could be the tools to build a real 3.0 upon.

If we don't, I'm sure others in the web server market will continue to
gain market share.

But I think we could make do it better.  We have the experience, we
know the value of a modules ecosystem, we build stable, quality
software.  We just need to step up to how the internet is changing.

Thoughts?

Thanks,

Paul


Re: Kill a request nicely

2011-06-15 Thread Nick Kew
On Wed, 15 Jun 2011 13:11:43 -0500
Jason Funk jasonlf...@gmail.com wrote:

 User Makes Request-  Web Server processes and generates output - My
 module analyzes ouput determines whether it should be passed back to the
 user or not.

mod_security?

If it doesn't do what you need, it should at least be
a good startingpoint to hack it.


-- 
Nick Kew

Available for work, contract or permanent.
http://www.webthing.com/~nick/cv.html


Re: mod_lua Filter Hook?

2011-06-15 Thread HyperHacker
On Wed, Jun 15, 2011 at 15:34, Joachim Zobel jzo...@heute-morgen.de wrote:
 On Wed, 2011-06-15 at 17:04 -0400, Akins, Brian wrote:

 For filters, etc, not sure we really need buckets in Lua.  Maybe just
 represent them as a table of buffers or something simple like that.

 This misses my (admittedly not so important) use case of sax buckets.
 See
 http://www.heute-morgen.de/site/03_Web_Tech/50_Building_an_Apache_XML_Rewriting_Stack.shtml

 Sincerely,
 Joachim





I'd been looking forward to mod_lua for a while now expecting it to
work similarly to PHP (handle requests, send output without having to
worry about how the httpd works). Is that not the case?

-- 
Sent from my toaster.


Re: 3.0, the 2011 thread.

2011-06-15 Thread Akins, Brian
On 6/15/11 6:01 PM, Paul Querna p...@querna.org wrote:

 pocore: For base OS portability and memory pooling system.
   http://code.google.com/p/pocore/

How does this compare to APR?

 libuv: Portable, fast, Network IO. (IOCP programming model, brought to Unix)
   https://github.com/joyent/libuv

I've played with it.  It's rough - particularly dealing with memory.

 http-parser: HTTP really broken out to simple callbacks.
   https://github.com/ry/http-parser

I like this one a lot.

 selene: SSL, redone to better support Async IO.
   https://github.com/pquerna/selene

Haven't had a chance.
 

+1 to the idea.  I still like Lua ;) People said I was crazy when I said Lua
should be the config and the runtime - now look at node.js

-- 
Brian Akins




Re: 3.0, the 2011 thread.

2011-06-15 Thread Paul Querna
On Wed, Jun 15, 2011 at 3:26 PM, Akins, Brian brian.ak...@turner.com wrote:
 On 6/15/11 6:01 PM, Paul Querna p...@querna.org wrote:

 pocore: For base OS portability and memory pooling system.
   http://code.google.com/p/pocore/

 How does this compare to APR?

It's like an APR version 3.0.

It has a faster pools system, with the ability to free() items, and it
drops all of the apr-utilism things like databases, ldap, etc.


Re: mod_lua Filter Hook?

2011-06-15 Thread Akins, Brian
On 6/15/11 6:26 PM, HyperHacker hyperhac...@gmail.com wrote:

= 
 I'd been looking forward to mod_lua for a while now expecting it to
 work similarly to PHP (handle requests, send output without having to
 worry about how the httpd works). Is that not the case?

Brian M. can correct me, but the original intent for mod_lua (nee
mod_wombat) was when you needed to get at the internals of Apache, but
didn't want to write a full on C module.  Like needed nested if's for a
rewrite, a strange auth method, or whatever.  It was not really meant to be
a competitor to php, ruby, python, etc. for application development.

-- 
Brian Akins




Re: 3.0, the 2011 thread.

2011-06-15 Thread Graham Leggett

On 16 Jun 2011, at 12:01 AM, Paul Querna wrote:

I think we have all joked on and off about 3.0 for... well about 8  
years now.


I think we are nearing the point we might actually need to be  
serious about it.


The web is changed.

SPDY is coming down the pipe pretty quickly.

WebSockets might actually be standardized this year.

Two protocols which HTTPD is unable to be good at. Ever.

The problem is our process model, and our module APIs.


I am not convinced.

Over the last three years, I have developed a low level stream serving  
system that we use to disseminate diagnostic data across datacentres,  
and one of the basic design decisions was that  it was to be lock free  
and event driven, because above all it needed to be fast. The event  
driven stuff was done properly, based on religious application of the  
following rule:


Thou shalt not attempt any single read or write without the event  
loop giving you permission to do that single read or write first. Not  
a single attempt, ever.


From that effort I've learned the following:

- Existing APIs in unix and windows really really suck at non blocking  
behaviour. Standard APR file handling couldn't do it, so we couldn't  
use it properly. DNS libraries are really terrible at it. The vast  
majority of async DNS libraries are just hidden threads which wrap  
attempts to make blocking calls, which in turn means unknown resource  
limits are hit when you least expect it. Database and LDAP calls are  
blocking. What this means practically is that you can't link to most  
software out there.


- You cannot block, ever. Think you can cheat and just make a cheeky  
attempt to load that file quickly while nobody is looking? Your hard  
disk spins down, your network drive is slow for whatever reason, and  
your entire server stops dead in its tracks. We see this choppy  
behaviour in poorly written user interface code, we see the same  
choppy behaviour in cheating event driven webservers.


- You have zero room for error. Not a single mistake can be tolerated.  
One foot wrong, the event loop spins. Step one foot wrong the other  
way, and your task you were doing evaporates. Finding these problems  
is painful, and your server is unstable until you do.


- You have to handle every single possible error condition. Every  
single one. Miss one? You suddenly drop out of an event handler, and  
your event loop spins, or the request becomes abandoned. You have no  
room for error at all.


We have made our event driven code work because it does a number of  
very simple and small things, and it's designed to do these simple and  
small things well, and we want it to be as compact and fast as humanly  
possible, given that datacentre footprint is our primary constraint.


Our system is like a sportscar, it's fast, but it breaks down if we  
break the rules. But for us, we are prepared to abide by the rules to  
achieve the speed we need.


Let's contrast this with a web server.

Webservers are traditionally fluid beasts, that have been and continue  
to be moulded and shaped that way through many many ever changing  
requirements from webmasters. They have been made modular and  
extensible, and those modules and extensions are written by people  
with different programming ability, to different levels of tolerances,  
within very different budget constraints.


Simply put, webservers need to tolerate error. They need to be built  
like tractors.


Unreliable code? We have to work despite that. Unhandled error  
conditions? We have to work despite that. Code that was written in a  
hurry on a budget? We have to work despite that.


Are we going to be sexy? Of course not. But while the sportscar is  
broken down at the side of the road, the tractor just keeps going.


Why does our incredibly unsexy architecture help webmasters? Because  
prefork is bulletproof. Leak, crash, explode, hang, the parent will  
clean up after us. Whatever we do, within reason, doesn't affect the  
process next door. If things get really dire, we're delayed for a  
while, and we recover when the problems pass. Does the server die?  
Pretty much never. What if we trust our code? Well, worker may help  
us. Crashes do affect the request next door, but if they're rare  
enough we can tolerate it. The event mpm? It isn't truly an even mpm,  
it is rather more efficient when it comes to keepalives and waiting  
for connections, where we hand this problem to an event loop that  
doesn't run anyone else's code within it, so we're still reliable  
despite the need for a higher standard of code accuracy.


If you've ever been in a situation where a company demands more speed  
out of a webserver, wait until you sacrifice reliability giving them  
the speed. Suddenly they don't care about the speed, reliability  
becomes top priority again, as it should be.


So, to get round to my point. If we decide to relook at the  
architecture of v3.0, we should be careful to ensure that we don't 

Re: 3.0, the 2011 thread.

2011-06-15 Thread Colm MacCárthaigh
On Wed, Jun 15, 2011 at 3:01 PM, Paul Querna p...@querna.org wrote:
 I think we have all joked on and off about 3.0 for... well about 8 years now.

At least!

 I think there are exciting things happening in C however.

I love C, but unless we can come up with something radical, it's hard
to see a way out of the prison it creates. That realisation led me to
hacking mostly on functional-oriented servers. I'll try to explain why
- in case any of those thoughts are useful here too :)

I like the things you've pointed out, but they seem relatively
cosmetic. Things like the parser, async, event and portability
frameworks are really cool - but hardly fundamental. Anyone could use
those, in any language - it's not a real leap in the field. Similarly,
SPDY, websockets, COMET and so on are ultra-cool - but are still
potential bolt-ons to almost any kind of webserver. It sucks that we
don't do them well, but doing them better won't fundamentally change
the market or the pressures on adoption.

Today webservers are almost entirely network I/O bound - disk seek and
CPU speeds are pretty great these days, way faster than is really
neccessary. In a properly architected set-up, end-user delay is really
about the limitations of TCP. You can multiplex and use keepalives as
much as you want, you'll eventually realise that the size of the world
and speed of light mean that this inevitably ends up being slow
without a lot of distributed endpoints.

But we have some cool secret sauce to help fix that. I think the best
architectural thing about Apache is buckets and brigades. Using a list
structure to represent portions of differently-generated content like
that is great. Imagine how much better wordpress would run if PHP
represented the php scripts as a some dynamic buckets intermingled
with some static file io buckets (and even repeated when in loops).
There'd be a lot less data to copy around.

Now imagine a backend that could identify the dynamic buckets and, by
forbidding side effects, parallellise work on them - a bucket as a
message in message-passing system of co-routines, for example. Imagine
that in turn feeding into a set of co-routine filters. That's
fundamentally different - it parallelises content generation, but it's
really really hard to do in C.

Then next, imagine a backend that could identify the static buckets
and re-order them so that they come first - it could understand things
like XML and Javascript and intelligently front-load your transfer so
that the content we have ready goes first, while the dynamic stuff is
being built. It's a real layer-8-aware scheduler and content
re-compiler. Again it's really really hard to do in C - but imagine
the benefits of a server layer that really understood how to model and
re-order content.

These are the kinds of transform that make a webservers job as optimal
as it can be. Network data is the most expensive part of any modern
web application, in terms of both time and money, so the ecosystem
faces huge economic pressure to make these as optimal as possible over
time. Things like SPDY are just the first generation.

It'd be cool if Apache 3.0 could do those things - we have some great
building blocks and experience - but it feels like a language with
support for first-order functions and co-routines would be better at
it.

Again, I'm just thinking out loud :)

-- 
Colm


Re: 3.0, the 2011 thread.

2011-06-15 Thread Joe Schaefer
+1 amen to reliability coming first. We run all kinds of awful code in 
production at the ASF, and httpd's design papers over that elegantly.  Losing 
that would be a terrible blow to the utility of the project.

Sent from my iPhone

On Jun 15, 2011, at 7:33 PM, Graham Leggett minf...@sharp.fm wrote:

On 16 Jun 2011, at 12:01 AM, Paul Querna wrote:

I think we have all joked on and off about 3.0 for... well about 8 years now.

I think we are nearing the point we might actually need to be serious about it.

The web is changed.

SPDY is coming down the pipe pretty quickly.

WebSockets might actually be standardized this year.

Two protocols which HTTPD is unable to be good at. Ever.

The problem is our process model, and our module APIs.

I am not convinced.

Over the last three years, I have developed a low level stream serving system 
that we use to disseminate diagnostic data across datacentres, and one of the 
basic design decisions was that  it was to be lock free and event driven, 
because above all it needed to be fast. The event driven stuff was done 
properly, based on religious application of the following rule:

Thou shalt not attempt any single read or write without the event loop giving 
you permission to do that single read or write first. Not a single attempt, 
ever.

From that effort I've learned the following:

- Existing APIs in unix and windows really really suck at non blocking 
behaviour. Standard APR file handling couldn't do it, so we couldn't use it 
properly. DNS libraries are really terrible at it. The vast majority of async 
DNS libraries are just hidden threads which wrap attempts to make blocking 
calls, which in turn means unknown resource limits are hit when you least 
expect it. Database and LDAP calls are blocking. What this means practically is 
that you can't link to most software out there.

- You cannot block, ever. Think you can cheat and just make a cheeky attempt to 
load that file quickly while nobody is looking? Your hard disk spins down, your 
network drive is slow for whatever reason, and your entire server stops dead in 
its tracks. We see this choppy behaviour in poorly written user interface code, 
we see the same choppy behaviour in cheating event driven webservers.

- You have zero room for error. Not a single mistake can be tolerated. One foot 
wrong, the event loop spins. Step one foot wrong the other way, and your task 
you were doing evaporates. Finding these problems is painful, and your server 
is unstable until you do.

- You have to handle every single possible error condition. Every single one. 
Miss one? You suddenly drop out of an event handler, and your event loop spins, 
or the request becomes abandoned. You have no room for error at all.

We have made our event driven code work because it does a number of very simple 
and small things, and it's designed to do these simple and small things well, 
and we want it to be as compact and fast as humanly possible, given that 
datacentre footprint is our primary constraint.

Our system is like a sportscar, it's fast, but it breaks down if we break the 
rules. But for us, we are prepared to abide by the rules to achieve the speed 
we need.

Let's contrast this with a web server.

Webservers are traditionally fluid beasts, that have been and continue to be 
moulded and shaped that way through many many ever changing requirements from 
webmasters. They have been made modular and extensible, and those modules and 
extensions are written by people with different programming ability, to 
different levels of tolerances, within very different budget constraints.

Simply put, webservers need to tolerate error. They need to be built like 
tractors.

Unreliable code? We have to work despite that. Unhandled error conditions? We 
have to work despite that. Code that was written in a hurry on a budget? We 
have to work despite that.

Are we going to be sexy? Of course not. But while the sportscar is broken down 
at the side of the road, the tractor just keeps going.

Why does our incredibly unsexy architecture help webmasters? Because prefork is 
bulletproof. Leak, crash, explode, hang, the parent will clean up after us. 
Whatever we do, within reason, doesn't affect the process next door. If things 
get really dire, we're delayed for a while, and we recover when the problems 
pass. Does the server die? Pretty much never. What if we trust our code? Well, 
worker may help us. Crashes do affect the request next door, but if they're 
rare enough we can tolerate it. The event mpm? It isn't truly an even mpm, it 
is rather more efficient when it comes to keepalives and waiting for 
connections, where we hand this problem to an event loop that doesn't run 
anyone else's code within it, so we're still reliable despite the need for a 
higher standard of code accuracy.

If you've ever been in a situation where a company demands more speed out of a 
webserver, wait until you sacrifice reliability giving them the speed. Suddenly 

Re: 3.0, the 2011 thread.

2011-06-15 Thread Akins, Brian
On 6/15/11 7:40 PM, Colm MacCárthaigh c...@allcosts.net wrote:
  Imagine
 that in turn feeding into a set of co-routine filters. That's
 fundamentally different - it parallelises content generation, but it's
 really really hard to do in C.

Depending on how far you want to push the model, it's not that hard.
Obviously you can't do co-routines but just using the current ideas about
requests and sub requests, you could easily do the subrequests in parallel.
FWIW, nginx can use Lua co-routines to do this and does it natively with
SSI's. The code, however, will make you go blind ;)

My biggest issue with HTTPD really comes down to connections per OS image.
In general, threads suck at this - memory per connection and context
switches just kill you.  C1M is just not that hard to achieve nowadays.

-- 
Brian Akins




Re: mod_lua Filter Hook?

2011-06-15 Thread William A. Rowe Jr.
On 6/15/2011 6:09 PM, Akins, Brian wrote:
 On 6/15/11 6:26 PM, HyperHacker hyperhac...@gmail.com wrote:
 
 = 
 I'd been looking forward to mod_lua for a while now expecting it to
 work similarly to PHP (handle requests, send output without having to
 worry about how the httpd works). Is that not the case?
 
 Brian M. can correct me, but the original intent for mod_lua (nee
 mod_wombat) was when you needed to get at the internals of Apache, but
 didn't want to write a full on C module.  Like needed nested if's for a
 rewrite, a strange auth method, or whatever.  It was not really meant to be
 a competitor to php, ruby, python, etc. for application development.

E.g. much of mod_perl's flexibility without the overhead of a perl
interpreter instance.