Mark Kirkwood <[EMAIL PROTECTED]> writes: > I think there is some confusion between "many concurrent connections + short > transactions" and "many connect / disconnect + short transactions" in some of > this discussion.
I was intended to clarify that but left it out. In fact I think that's precisely one of the confusions that's obscuring things in this ongoing debate. Worrying about connection time is indeed a red herring. Most databases have slow connection times so most database drivers implement some form of cached connections. A lot of effort has gone into working around this particular database design deficiency. However even if you reuse existing database connections, you nonetheless are still context switching between hundreds or potentially thousands of threads of execution. The lighter-weight that context switch is, the faster it'll be able to do that. For a web site where all the queries are preparsed, all the data is cached in ram, and all the queries involve quick single record lookups and updates, the machine is often quite easily driven 100% cpu bound. It's tricky to evaluate the cost of the context switches because a big part of the cost is simply the tlb flushes. Not only does a process context switch involve swapping in memory maps and other housekeeping, but all future memory accesses like the data copies that an OLTP system spends most of its time doing are slowed down. And the other question is how much memory does having many processes running consume? Every page those processes are consuming that could have been shared is a page that isn't being used for disk caching, and another page to pollute the processor's cache. So for example, I wonder how fast postgres would be if there were a thousand connections open, all doing fast one-record index lookups as fast as they can. People are going to say that would just be a poorly designed system, but I think they're just not applying much foresight. Reasonably designed systems easily need several hundred connections now, and future large systems will undoubtedly need thousands. Anyways, this is a long standing debate and the FAQ answer is mostly, we'll find out when someone writes the code. Continuing to debate it isn't going to be very productive. My only desire here is to see more people realize that optimizing for tons of short transactions using data cached in ram is at least as important as optimizing for big complex transactions on huge datasets. -- greg ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster