Mark Kirkwood <[EMAIL PROTECTED]> writes:

> I think there is some confusion between "many concurrent connections + short
> transactions" and "many connect / disconnect + short transactions" in some of
> this discussion.

I was intended to clarify that but left it out. In fact I think that's
precisely one of the confusions that's obscuring things in this ongoing
debate.

Worrying about connection time is indeed a red herring. Most databases have
slow connection times so most database drivers implement some form of cached
connections. A lot of effort has gone into working around this particular
database design deficiency.

However even if you reuse existing database connections, you nonetheless are
still context switching between hundreds or potentially thousands of threads
of execution. The lighter-weight that context switch is, the faster it'll be
able to do that.

For a web site where all the queries are preparsed, all the data is cached in
ram, and all the queries involve quick single record lookups and updates, the
machine is often quite easily driven 100% cpu bound. 

It's tricky to evaluate the cost of the context switches because a big part of
the cost is simply the tlb flushes. Not only does a process context switch
involve swapping in memory maps and other housekeeping, but all future memory
accesses like the data copies that an OLTP system spends most of its time
doing are slowed down.

And the other question is how much memory does having many processes running
consume? Every page those processes are consuming that could have been shared
is a page that isn't being used for disk caching, and another page to pollute
the processor's cache.

So for example, I wonder how fast postgres would be if there were a thousand
connections open, all doing fast one-record index lookups as fast as they can.

People are going to say that would just be a poorly designed system, but I
think they're just not applying much foresight. Reasonably designed systems
easily need several hundred connections now, and future large systems will
undoubtedly need thousands.

Anyways, this is a long standing debate and the FAQ answer is mostly, we'll
find out when someone writes the code. Continuing to debate it isn't going to
be very productive. My only desire here is to see more people realize that
optimizing for tons of short transactions using data cached in ram is at least
as important as optimizing for big complex transactions on huge datasets.

--
greg


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to