On Wed, 2008-05-21 at 17:53 -0700, Graham Dumpleton wrote: > On May 22, 5:20 am, Cliff Wells <[EMAIL PROTECTED]> wrote: > > I think this is true for all of us. The difference is that the world > > has changed in the last couple of years and now there's more options to > > choose from. And by "options" I don't mean "a smaller, less capable > > Apache clone", I mean a paradigm shift in how to handle high loads. > > It's well known that threaded/process based servers cannot scale beyond > > a reasonable point. Nginx and Lighttpd are async and are specifically > > written to address the C10K problem. > > There are two approaches one can use for addressing scalability, they > are vertical scaling and horizontal scaling. > > In vertical scaling one just upgrades your existing single machine > with a bigger more capable machine. For this path then yes, nginx and > lighttpd may give your more head room than Apache. The problem with > vertical scaling is cost, plus that you will hit the limit of what the > hardware can achieve much sooner than with horizontal scaling.
Except that vertical scaling doesn't preclude horizontal scaling, it merely postpones the necessity for implementing it (if not the planning) and helps limit the scope of it. If Nginx provides superior vertical scaling, then it will also provide superior horizontal scaling since vertically scaled systems are the building blocks of a horizontally scaled system. > With horizontal scaling you keep your existing machine and just add > more machines. For horizontal scaling, the limit is going to be how > easy it is to accommodate your application across a growing number of > machines. The scalability of Apache here isn't generally going to be > an issue as you would have sufficient machines to spread the load so > as to not unduly overload a single machine. > Although one is buying more hardware with horizontal scaling, the cost/ > performance curve would generally increases at a lessor rate than with > vertical scaling. Again, I think this contrast is artificial. You are setting up vertical scaling and horizontal scaling as mutually exclusive when they are anything but, and unless you have endlessly deep pockets, you should prefer to control the growth of your horizontal scaling. > Of course, there is still a whole lot more to it than that as you need > to consider power costs, networking costs for hardware/software load > balancing, failover and possible need for multiple data centres > distributed over different geographic locations. Absolutely. And while hardware costs are dropping, hosting and power costs are going up. My colocation fees have increased an average of 10% per year, and power fees have quadrupled since I started. I don't expect this trend to change any time soon. > One thing that keeps troubling me about some of the discussions here > as to what solution may be better than another is that it appear to > focus on what solution may be best for static file sharing or proxying > etc. One has to keep in mind that Python web applications have > different requirements than these use cases. Python web applications > also have different needs to PHP applications. Given that an average web page is probably 70% or more static or cached content, I think this is a critical aspect. > As I originally pointed out, for Python web applications, in general > any solution will do as it isn't the network or the web server > arrangement that will be the bottleneck. What does it matter if one > solution is twice as fast than another for a simple hello world > program, when the actual request time saved by that solution when > applied to a real world application is far less than 1% of overall > request time. If you try to scale a dynamic application and are going to pass part of the request off to Python on every request you are going to either fail spectacularly or spend an awful lot of money scaling horizontally. There's a reason people have successfully deployed huge Rails apps and it's not often by having 300 servers. They manage it by making sure that Rails is only called when absolutely necessary and letting a fast webserver handle most of the load. In any case, the same techniques are going to be applied regardless of which web server you choose. The question is more "how much of my limited and expensive resources is this single part of my stack going to consume and what benefit will I be getting for it?" Unless you require a specific module, Nginx and Apache are more-or-less functionally equivalent, except that one uses a fraction of the resources of the other. > For non database Python web applications issues such as the GIL, and > how multithreading and/or multiple processes are used is going to be a > bigger concern and have more impact on performance. This is in as much > as running a single multithreaded process isn't going to cut it when > scaling. Thus ease of configuring use of multiple processes is more > important as is the ability to recycle processes to avoid issues with > increasing memory usage. I'd consider "increasing memory usage" to be a bug in the application and outside the scope of discussion. As far as ease of configuring multiple processes, I use Nginx's built-in load balancing and a 4 line shell script to start my application. Don't get me wrong, I think Apache's process management is quite nice and I'd like to see something similar added to Nginx, but it's hardly a show-stopper. > There is a also the balance between having > fixed numbers of processes as is necessary when using fastcgi like > approaches, or the ability in something like Apache to dynamically > adjust the number of processes to handle requests. Remember you said this (see below*). > Add databases into > the mix and you get into a whole new bunch of issues, which others are > already discussing. > Memory usage in all of this is a big issue and granted that for static > file serving nginx and httpd will consume less memory. The difference > though for a dynamic Python web application isn't going to be that > marked. I disagree. As I mentioned earlier, someone I know recently took an Apache/mod_php application consuming 1.2GB of RAM down to 200MB using Nginx/FastCGI with no loss in performance or functionality. It's not clear to me why a Python application would be much different. > If you are running a 80MB Python web application process, it > is still going to be about that size whatever hosting solution you > use. This is because the memory usage is from the Python web > application, not the underlying web server. The problem is more to do > with how you manage the multiple instances of that 80MB process. Sort of. However consider this: if I am running Nginx I can reasonably *fill* a single server with Python processes and not worry too much about how much memory Nginx consumes. The resources are available for running the *application* rather than the webserver. Because the Python application will undoubtedly be one of the first bottlenecks (database next), the ability to horizontally scale the application (by running multiple instances) is critical. By using up system resources, Apache limits the number of instances of the application that can be run on a single machine, and by extension across multiple machines. > There have been discussions over on the Python WEB-SIG about making > WSGI better support asynchronous web servers. Part of their rational > was that it gave better scalability because it could handle more > concurrent requests and wouldn't be restricted by number of threads > being used. The problem that was pointed out to them which they then > didn't address is that where one is handling more concurrent requests, > the transient memory requirements of your process then theoretically > can be more. > At least where you have a set number of threads you can > get a handle on what maximum memory usage may be by looking at the > maximum transient requirements of your worst request handler. Then you agree that dynamically adjusting the process pool size is bad since it would have the same net effect? This appears (to me) to contradict what you claimed as a feature earlier [*]. > With an > asynchronous model where theoretically an unbounded number of > concurrent requests could be handled at the same time, you could > really blow out your memory requirements if they all hit the same > memory hungry request handler at the same time. Thus a more > traditional synchronous model can thus give you more predictability, > which for large systems in itself can be an important consideration. Of course, this is where your earlier suggestion of using a hardware load-balancer would be a good idea. I think a much better use of resources (read "money") would be spending some of it on a dedicated load-balancing solution which can control how requests are distributed rather than repurposing inefficiency into a feature. At any rate, I don't actually think the above has much to do with Nginx vs Apache as Pylons deployment options. Because Pylons tends to be run as a threaded app (is anyone doing otherwise?), we still have the same predictability. In fact our predictability is easier since we don't need to calculate the cost of the web server's memory explosion in addition to our application's needs. In all of the above, I haven't seen any explanation from you as to why Apache would be superior to Nginx as a deployment option, only that it wouldn't be the worst bottleneck in your application stack. Not terribly convincing. If we were discussing a closed-source solution versus an open source solution, this might be sufficient ("good enough"), but that's not the case here. I'll give you a quick list of actual benefits I see from using Nginx: 1) low CPU overhead 2) small memory footprint 3) consistent latency for responses 4) scalable in all directions 5) simple and and syntactically consistent configuration Benefits I see for Apache: 1) excellent documentation 2) wide array of modules, especially esoteric ones 3) mod_wsgi provides a slightly more efficient communication gateway to Python backends 4) automatic process management (restarting backends) Of Apache's benefits I see 1) as mostly moot due to Nginx's simplicity 2) completely moot since I don't use them 3) not enough to overcome the efficiency lost elsewhere 4) as mostly moot because it's simple to solve in other ways This probably doesn't exactly match other people's requirements and certainly there are other considerations that might tip the scales one way or the other. > Anyway, this is getting a fair bit off topic and since others are > seeing my rambles as such, I'll try and refrain in future. :-) Please don't. You happen to be one of the few feather-heads I don't mind hearing from, even if I find your arguments kind of slippery ;-) And incidentally, congrats on your baby =) For people who care more about numbers than theoretical discussions (aka "obstinate") please refer to the following which provides a fairly decent overview of resource utilization between the two servers: http://www.joeandmotorboat.com/2008/02/28/apache-vs-nginx-web-server-performance-deathmatch/ Regards, Cliff --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pylons-discuss" group. To post to this group, send email to pylons-discuss@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/pylons-discuss?hl=en -~----------~----~----~----~------~----~------~--~---