On Sun, 2002-10-13 at 04:47, fabio rohrich wrote: > HI! > I wrote you last time about my development of a new > apache module. > > mod_blanks: a module for the Apache web server which would on-the-fly > remove unnecessary blank space, comments and other non-interesting > things from the served page. Skills needed: the C langugae, a bit of > text parsing techniques, HTML, learn Apache API. Complexity: low to > moderate (after learning the API). Usefulness: moderate to low (but > maybe better than that, it's a kind of nice toy topic that could be > shown to save a lot of bandwith on the Internet :-). > > So, the question is. I'm developing it for my bachelor thesis > and my teacher told me it's too easy to develop it. > So, have you some ideas, like something to do more (something > like compression) or other things to add in the module.
If you want to stick with the mod_blanks idea but make it more more advanced (so that it's complicated enough to be a thesis project), here are a couple of ideas: * Removing extra spaces/comments/etc from HTML while delivering it is a good idea, but it's not necessarily something that you want your web server to do on every request. If you deliver the same page a hundred times per day (or a hundred times per second), it's wasteful to keep doing the same parsing work on the same file over and over. So one possibility is: make the module smart enough to cache the "optimized" versions of pages. * Another challenge with mod_blanks is that there is a tradeoff between bandwidth cost and hardware cost. If you do a lot of processing to reduce the bytes sent (removing extraneous spaces, compression, etc), it will reduce your bandwidth cost, but you'll have to spend more on server hardware. And if your server suddenly gets a lot of traffic, it might be able to handle the extra load, but not if it also has to do all the mod_blanks processing (the same idea applies to mod_deflate also). So one idea that might be interesting is: Let the server administrator define which optional filters can be skipped when the server is heavily loaded. (An "optional" module in this situation would mean something that we could skip without causing a bad response to be sent to the client. So mod_deflate counts as optional, for example, but mod_include doesn't.) Then, during request processing, decide whether to run the optional filters based on how overloaded the server is. * One more idea: do some research to determine which is faster: removing blanks and comments, or just compressing the HTML. Or, to put it another way, build mod_blanks and compare its performance to mod_deflate. Mod_blanks would have an advantage, because it can use simpler and faster code. On the other hand, mod_deflate also has an advantage because it will result in a smaller block of bytes being written to the socket, which usually will reduce the CPU time spent in the kernel. Which one will win? Or is it better to do both: eliminate spaces and comments, and also compress? Brian