Re: Kill a request nicely
I am writing an output filter module that will under some circumstances want to send back an HTTP Error Code and kill the request without sending back the content. ... The only way you can affect a response status is to return an error to your caller before passing anything down the chain. ... So what option do I have? Cache all the buckets of the request without passing anything down the chain so I can make the decision about what to do? Obviously reading in the whole request is asking for trouble, mainly a memory bomb when the output you're filtering i slarger than you planned. What to do, I think, depends on what your module is trying to accomplish. What sort of error might happen in the middle of the content? First, consider whether or not your module is actually a filter. That is, does it transform the content from one representation to another, or is it actually something else, like an access, authorization or authentication module? At least in my mind, a filter is like mod_deflate or chunking - something that changes how the content is represented, but doesn't fundamentally change the content itself. If it sometimes changes 200 OK content into some error, it may not be an output filter. Secondly, can the error be detected earlier with some proactive checking? -- Ray Morris supp...@bettercgi.com Strongbox - The next generation in site security: http://www.bettercgi.com/strongbox/ Throttlebox - Intelligent Bandwidth Control http://www.bettercgi.com/throttlebox/ Strongbox / Throttlebox affiliate program: http://www.bettercgi.com/affiliates/user/register.php On Tue, 14 Jun 2011 21:08:28 -0500 Jason Funk jasonlf...@gmail.com wrote: So what option do I have? Cache all the buckets of the request without passing anything down the chain so I can make the decision about what to do? Would that even require caching them in the filter's ctx? Do I have any other choice? On Tue, Jun 14, 2011 at 5:12 PM, Nick Kew n...@apache.org wrote: On Tue, 14 Jun 2011 16:31:22 -0500 Jason Funk jasonlf...@gmail.com wrote: I am writing an output filter module that will under some circumstances want to send back an HTTP Error Code and kill the request without sending back the content. You can't set an HTTP response code when one has already been sent down the wire to the client. It's in the nature of an output filter that the work is done in a callback, which (in general) only happens after the response has started, and too late to set a response code or headers. Thus the filter callback API isn't designed to set a status like a processing hook can when it determines the outcome of a request. The only way you can affect a response status is to return an error to your caller before passing anything down the chain. -- Nick Kew Available for work, contract or permanent. http://www.webthing.com/~nick/cv.html
Re: Kill a request nicely
On Wed, 2011-06-15 at 13:11 -0500, Jason Funk wrote: User Makes Request- Web Server processes and generates output - My module analyzes ouput determines whether it should be passed back to the user or not. Sounds like you have the right one, an output filter. However, should it really just delete the content it is checking for, rather than to try and force an error response to the browser? Or are you trying to end with a 403 forbidden? There should be output filters that can change things BEFORE the headers are sent (see AP_FTYPE_RESOURCE). PHP is one of those that behave this way. Remember though, that sending a bucket brigade on to the next filter may result in the headings being sent. If you use an output filter, loop through the buckets (don't flatten them) to ensure everything is okay, before passing to the next filter. If not, you can create a new bucket brigade and send that on. Joe Lewis -- Director - Systems Administration http://www.silverhawk.net/
Re: mod_lua Filter Hook?
On Mon, 2011-06-13 at 11:14 -0600, Brian McCallister wrote: I'd very much like to support filters in mod_lua, but have not had a chance to figure out the best way to do it. Help would be VERY appreciated :-) mod_perl does this pretty good IMHO. The point is auto generation of usable API wrappers. How do the different options for adding filter support look like? Sincerely, Joachim
Re: mod_lua Filter Hook?
On 6/15/11 4:08 PM, Joachim Zobel jzo...@heute-morgen.de wrote: mod_perl does this pretty good IMHO. The point is auto generation of usable API wrappers. FWIW, and this is just my opinion, but I'm not not 100% sure that having a complete (or near) complete Lua version of the HTTPD (and APR) API is really worth the effort. I've grown to like the very simple way some other web servers have done it. I've also learn to write as much code in Lua as possible and just have the low level glue in C, especially with the jit. For filters, etc, not sure we really need buckets in Lua. Maybe just represent them as a table of buffers or something simple like that. -- Brian Akins
Re: mod_lua Filter Hook?
On Wed, 2011-06-15 at 17:04 -0400, Akins, Brian wrote: For filters, etc, not sure we really need buckets in Lua. Maybe just represent them as a table of buffers or something simple like that. This misses my (admittedly not so important) use case of sax buckets. See http://www.heute-morgen.de/site/03_Web_Tech/50_Building_an_Apache_XML_Rewriting_Stack.shtml Sincerely, Joachim
3.0, the 2011 thread.
I think we have all joked on and off about 3.0 for... well about 8 years now. I think we are nearing the point we might actually need to be serious about it. The web is changed. SPDY is coming down the pipe pretty quickly. WebSockets might actually be standardized this year. Two protocols which HTTPD is unable to be good at. Ever. The problem is our process model, and our module APIs. The Event MPM was a valiant effort in some ways, but mod_ssl and other filters will always block its progress, and with protocols like SPDY, falling back to Worker MPM behaviors is pointless. I think there are exciting things happening in C however. 4 projects that maybe could form the baseline for something new. pocore: For base OS portability and memory pooling system. http://code.google.com/p/pocore/ libuv: Portable, fast, Network IO. (IOCP programming model, brought to Unix) https://github.com/joyent/libuv http-parser: HTTP really broken out to simple callbacks. https://github.com/ry/http-parser selene: SSL, redone to better support Async IO. https://github.com/pquerna/selene All of these are young. Most are incomplete. But they could be the tools to build a real 3.0 upon. If we don't, I'm sure others in the web server market will continue to gain market share. But I think we could make do it better. We have the experience, we know the value of a modules ecosystem, we build stable, quality software. We just need to step up to how the internet is changing. Thoughts? Thanks, Paul
Re: Kill a request nicely
On Wed, 15 Jun 2011 13:11:43 -0500 Jason Funk jasonlf...@gmail.com wrote: User Makes Request- Web Server processes and generates output - My module analyzes ouput determines whether it should be passed back to the user or not. mod_security? If it doesn't do what you need, it should at least be a good startingpoint to hack it. -- Nick Kew Available for work, contract or permanent. http://www.webthing.com/~nick/cv.html
Re: mod_lua Filter Hook?
On Wed, Jun 15, 2011 at 15:34, Joachim Zobel jzo...@heute-morgen.de wrote: On Wed, 2011-06-15 at 17:04 -0400, Akins, Brian wrote: For filters, etc, not sure we really need buckets in Lua. Maybe just represent them as a table of buffers or something simple like that. This misses my (admittedly not so important) use case of sax buckets. See http://www.heute-morgen.de/site/03_Web_Tech/50_Building_an_Apache_XML_Rewriting_Stack.shtml Sincerely, Joachim I'd been looking forward to mod_lua for a while now expecting it to work similarly to PHP (handle requests, send output without having to worry about how the httpd works). Is that not the case? -- Sent from my toaster.
Re: 3.0, the 2011 thread.
On 6/15/11 6:01 PM, Paul Querna p...@querna.org wrote: pocore: For base OS portability and memory pooling system. http://code.google.com/p/pocore/ How does this compare to APR? libuv: Portable, fast, Network IO. (IOCP programming model, brought to Unix) https://github.com/joyent/libuv I've played with it. It's rough - particularly dealing with memory. http-parser: HTTP really broken out to simple callbacks. https://github.com/ry/http-parser I like this one a lot. selene: SSL, redone to better support Async IO. https://github.com/pquerna/selene Haven't had a chance. +1 to the idea. I still like Lua ;) People said I was crazy when I said Lua should be the config and the runtime - now look at node.js -- Brian Akins
Re: 3.0, the 2011 thread.
On Wed, Jun 15, 2011 at 3:26 PM, Akins, Brian brian.ak...@turner.com wrote: On 6/15/11 6:01 PM, Paul Querna p...@querna.org wrote: pocore: For base OS portability and memory pooling system. http://code.google.com/p/pocore/ How does this compare to APR? It's like an APR version 3.0. It has a faster pools system, with the ability to free() items, and it drops all of the apr-utilism things like databases, ldap, etc.
Re: mod_lua Filter Hook?
On 6/15/11 6:26 PM, HyperHacker hyperhac...@gmail.com wrote: = I'd been looking forward to mod_lua for a while now expecting it to work similarly to PHP (handle requests, send output without having to worry about how the httpd works). Is that not the case? Brian M. can correct me, but the original intent for mod_lua (nee mod_wombat) was when you needed to get at the internals of Apache, but didn't want to write a full on C module. Like needed nested if's for a rewrite, a strange auth method, or whatever. It was not really meant to be a competitor to php, ruby, python, etc. for application development. -- Brian Akins
Re: 3.0, the 2011 thread.
On 16 Jun 2011, at 12:01 AM, Paul Querna wrote: I think we have all joked on and off about 3.0 for... well about 8 years now. I think we are nearing the point we might actually need to be serious about it. The web is changed. SPDY is coming down the pipe pretty quickly. WebSockets might actually be standardized this year. Two protocols which HTTPD is unable to be good at. Ever. The problem is our process model, and our module APIs. I am not convinced. Over the last three years, I have developed a low level stream serving system that we use to disseminate diagnostic data across datacentres, and one of the basic design decisions was that it was to be lock free and event driven, because above all it needed to be fast. The event driven stuff was done properly, based on religious application of the following rule: Thou shalt not attempt any single read or write without the event loop giving you permission to do that single read or write first. Not a single attempt, ever. From that effort I've learned the following: - Existing APIs in unix and windows really really suck at non blocking behaviour. Standard APR file handling couldn't do it, so we couldn't use it properly. DNS libraries are really terrible at it. The vast majority of async DNS libraries are just hidden threads which wrap attempts to make blocking calls, which in turn means unknown resource limits are hit when you least expect it. Database and LDAP calls are blocking. What this means practically is that you can't link to most software out there. - You cannot block, ever. Think you can cheat and just make a cheeky attempt to load that file quickly while nobody is looking? Your hard disk spins down, your network drive is slow for whatever reason, and your entire server stops dead in its tracks. We see this choppy behaviour in poorly written user interface code, we see the same choppy behaviour in cheating event driven webservers. - You have zero room for error. Not a single mistake can be tolerated. One foot wrong, the event loop spins. Step one foot wrong the other way, and your task you were doing evaporates. Finding these problems is painful, and your server is unstable until you do. - You have to handle every single possible error condition. Every single one. Miss one? You suddenly drop out of an event handler, and your event loop spins, or the request becomes abandoned. You have no room for error at all. We have made our event driven code work because it does a number of very simple and small things, and it's designed to do these simple and small things well, and we want it to be as compact and fast as humanly possible, given that datacentre footprint is our primary constraint. Our system is like a sportscar, it's fast, but it breaks down if we break the rules. But for us, we are prepared to abide by the rules to achieve the speed we need. Let's contrast this with a web server. Webservers are traditionally fluid beasts, that have been and continue to be moulded and shaped that way through many many ever changing requirements from webmasters. They have been made modular and extensible, and those modules and extensions are written by people with different programming ability, to different levels of tolerances, within very different budget constraints. Simply put, webservers need to tolerate error. They need to be built like tractors. Unreliable code? We have to work despite that. Unhandled error conditions? We have to work despite that. Code that was written in a hurry on a budget? We have to work despite that. Are we going to be sexy? Of course not. But while the sportscar is broken down at the side of the road, the tractor just keeps going. Why does our incredibly unsexy architecture help webmasters? Because prefork is bulletproof. Leak, crash, explode, hang, the parent will clean up after us. Whatever we do, within reason, doesn't affect the process next door. If things get really dire, we're delayed for a while, and we recover when the problems pass. Does the server die? Pretty much never. What if we trust our code? Well, worker may help us. Crashes do affect the request next door, but if they're rare enough we can tolerate it. The event mpm? It isn't truly an even mpm, it is rather more efficient when it comes to keepalives and waiting for connections, where we hand this problem to an event loop that doesn't run anyone else's code within it, so we're still reliable despite the need for a higher standard of code accuracy. If you've ever been in a situation where a company demands more speed out of a webserver, wait until you sacrifice reliability giving them the speed. Suddenly they don't care about the speed, reliability becomes top priority again, as it should be. So, to get round to my point. If we decide to relook at the architecture of v3.0, we should be careful to ensure that we don't
Re: 3.0, the 2011 thread.
On Wed, Jun 15, 2011 at 3:01 PM, Paul Querna p...@querna.org wrote: I think we have all joked on and off about 3.0 for... well about 8 years now. At least! I think there are exciting things happening in C however. I love C, but unless we can come up with something radical, it's hard to see a way out of the prison it creates. That realisation led me to hacking mostly on functional-oriented servers. I'll try to explain why - in case any of those thoughts are useful here too :) I like the things you've pointed out, but they seem relatively cosmetic. Things like the parser, async, event and portability frameworks are really cool - but hardly fundamental. Anyone could use those, in any language - it's not a real leap in the field. Similarly, SPDY, websockets, COMET and so on are ultra-cool - but are still potential bolt-ons to almost any kind of webserver. It sucks that we don't do them well, but doing them better won't fundamentally change the market or the pressures on adoption. Today webservers are almost entirely network I/O bound - disk seek and CPU speeds are pretty great these days, way faster than is really neccessary. In a properly architected set-up, end-user delay is really about the limitations of TCP. You can multiplex and use keepalives as much as you want, you'll eventually realise that the size of the world and speed of light mean that this inevitably ends up being slow without a lot of distributed endpoints. But we have some cool secret sauce to help fix that. I think the best architectural thing about Apache is buckets and brigades. Using a list structure to represent portions of differently-generated content like that is great. Imagine how much better wordpress would run if PHP represented the php scripts as a some dynamic buckets intermingled with some static file io buckets (and even repeated when in loops). There'd be a lot less data to copy around. Now imagine a backend that could identify the dynamic buckets and, by forbidding side effects, parallellise work on them - a bucket as a message in message-passing system of co-routines, for example. Imagine that in turn feeding into a set of co-routine filters. That's fundamentally different - it parallelises content generation, but it's really really hard to do in C. Then next, imagine a backend that could identify the static buckets and re-order them so that they come first - it could understand things like XML and Javascript and intelligently front-load your transfer so that the content we have ready goes first, while the dynamic stuff is being built. It's a real layer-8-aware scheduler and content re-compiler. Again it's really really hard to do in C - but imagine the benefits of a server layer that really understood how to model and re-order content. These are the kinds of transform that make a webservers job as optimal as it can be. Network data is the most expensive part of any modern web application, in terms of both time and money, so the ecosystem faces huge economic pressure to make these as optimal as possible over time. Things like SPDY are just the first generation. It'd be cool if Apache 3.0 could do those things - we have some great building blocks and experience - but it feels like a language with support for first-order functions and co-routines would be better at it. Again, I'm just thinking out loud :) -- Colm
Re: 3.0, the 2011 thread.
+1 amen to reliability coming first. We run all kinds of awful code in production at the ASF, and httpd's design papers over that elegantly. Losing that would be a terrible blow to the utility of the project. Sent from my iPhone On Jun 15, 2011, at 7:33 PM, Graham Leggett minf...@sharp.fm wrote: On 16 Jun 2011, at 12:01 AM, Paul Querna wrote: I think we have all joked on and off about 3.0 for... well about 8 years now. I think we are nearing the point we might actually need to be serious about it. The web is changed. SPDY is coming down the pipe pretty quickly. WebSockets might actually be standardized this year. Two protocols which HTTPD is unable to be good at. Ever. The problem is our process model, and our module APIs. I am not convinced. Over the last three years, I have developed a low level stream serving system that we use to disseminate diagnostic data across datacentres, and one of the basic design decisions was that it was to be lock free and event driven, because above all it needed to be fast. The event driven stuff was done properly, based on religious application of the following rule: Thou shalt not attempt any single read or write without the event loop giving you permission to do that single read or write first. Not a single attempt, ever. From that effort I've learned the following: - Existing APIs in unix and windows really really suck at non blocking behaviour. Standard APR file handling couldn't do it, so we couldn't use it properly. DNS libraries are really terrible at it. The vast majority of async DNS libraries are just hidden threads which wrap attempts to make blocking calls, which in turn means unknown resource limits are hit when you least expect it. Database and LDAP calls are blocking. What this means practically is that you can't link to most software out there. - You cannot block, ever. Think you can cheat and just make a cheeky attempt to load that file quickly while nobody is looking? Your hard disk spins down, your network drive is slow for whatever reason, and your entire server stops dead in its tracks. We see this choppy behaviour in poorly written user interface code, we see the same choppy behaviour in cheating event driven webservers. - You have zero room for error. Not a single mistake can be tolerated. One foot wrong, the event loop spins. Step one foot wrong the other way, and your task you were doing evaporates. Finding these problems is painful, and your server is unstable until you do. - You have to handle every single possible error condition. Every single one. Miss one? You suddenly drop out of an event handler, and your event loop spins, or the request becomes abandoned. You have no room for error at all. We have made our event driven code work because it does a number of very simple and small things, and it's designed to do these simple and small things well, and we want it to be as compact and fast as humanly possible, given that datacentre footprint is our primary constraint. Our system is like a sportscar, it's fast, but it breaks down if we break the rules. But for us, we are prepared to abide by the rules to achieve the speed we need. Let's contrast this with a web server. Webservers are traditionally fluid beasts, that have been and continue to be moulded and shaped that way through many many ever changing requirements from webmasters. They have been made modular and extensible, and those modules and extensions are written by people with different programming ability, to different levels of tolerances, within very different budget constraints. Simply put, webservers need to tolerate error. They need to be built like tractors. Unreliable code? We have to work despite that. Unhandled error conditions? We have to work despite that. Code that was written in a hurry on a budget? We have to work despite that. Are we going to be sexy? Of course not. But while the sportscar is broken down at the side of the road, the tractor just keeps going. Why does our incredibly unsexy architecture help webmasters? Because prefork is bulletproof. Leak, crash, explode, hang, the parent will clean up after us. Whatever we do, within reason, doesn't affect the process next door. If things get really dire, we're delayed for a while, and we recover when the problems pass. Does the server die? Pretty much never. What if we trust our code? Well, worker may help us. Crashes do affect the request next door, but if they're rare enough we can tolerate it. The event mpm? It isn't truly an even mpm, it is rather more efficient when it comes to keepalives and waiting for connections, where we hand this problem to an event loop that doesn't run anyone else's code within it, so we're still reliable despite the need for a higher standard of code accuracy. If you've ever been in a situation where a company demands more speed out of a webserver, wait until you sacrifice reliability giving them the speed. Suddenly
Re: 3.0, the 2011 thread.
On 6/15/11 7:40 PM, Colm MacCárthaigh c...@allcosts.net wrote: Imagine that in turn feeding into a set of co-routine filters. That's fundamentally different - it parallelises content generation, but it's really really hard to do in C. Depending on how far you want to push the model, it's not that hard. Obviously you can't do co-routines but just using the current ideas about requests and sub requests, you could easily do the subrequests in parallel. FWIW, nginx can use Lua co-routines to do this and does it natively with SSI's. The code, however, will make you go blind ;) My biggest issue with HTTPD really comes down to connections per OS image. In general, threads suck at this - memory per connection and context switches just kill you. C1M is just not that hard to achieve nowadays. -- Brian Akins
Re: mod_lua Filter Hook?
On 6/15/2011 6:09 PM, Akins, Brian wrote: On 6/15/11 6:26 PM, HyperHacker hyperhac...@gmail.com wrote: = I'd been looking forward to mod_lua for a while now expecting it to work similarly to PHP (handle requests, send output without having to worry about how the httpd works). Is that not the case? Brian M. can correct me, but the original intent for mod_lua (nee mod_wombat) was when you needed to get at the internals of Apache, but didn't want to write a full on C module. Like needed nested if's for a rewrite, a strange auth method, or whatever. It was not really meant to be a competitor to php, ruby, python, etc. for application development. E.g. much of mod_perl's flexibility without the overhead of a perl interpreter instance.