Re: [HACKERS] Using Threads?

2001-02-06 Thread Karel Zak

On Mon, 5 Feb 2001, Myron Scott wrote:

 I have put a new version of my multi-threaded
 postgresql experiment at
 
 http://www.sacadia.com/mtpg.html
 
 This one actually works.  I have added a server
 based on omniORB, a CORBA 2.3 ORB from ATT.  It
is much smaller than TAO and uses the thread per
 connection model.  I haven't added the java side
 of the JNI interface yet but the C++ side is there.
 
 It's still not stable but it is much better than
 [EMAIL PROTECTED]

 Sorry I haven't time to see and test your experiment,
but I have a question. How you solve memory management?
The current mmgr is based on global variable 
CurrentMemoryContext that is very often changed and used.
 Use you for this locks? If yes it is probably problematic
point for perfomance.

Karel




Re: [HACKERS] Using Threads

2001-02-06 Thread Karel Zak


On Tue, 6 Feb 2001, Myron Scott wrote:

 There are many many globals I had to work around including all the memory
 management stuff.  I basically threw everything into and "environment"
 variable which I stored in a thread specific using thr_setspecific.

 Yes, it's good. I working on multi-thread application server
(http://mape.jcu.cz) and I use for this project some things from PG (like
mmgr), I planning use same solution.

 Performance is acually very good for what I am doing.  I was able to batch
 commit transactions which cuts down on fsync calls, use prepared
 statements from my client using CORBA, and the various locking calls for
 the threads (cond_wait,mutex_lock, and sema_wait) seem pretty fast.  I did
 some performance tests for inserts 
 
 20 clients, 900 inserts per client, 1 insert per transaction, 4 different
 tables.
 
 7.0.2About10:52 average completion
 multi-threaded2:42 average completion
 7.1beta3  1:13 average completion

It is very very good for time for 7.1, already look forward to 7.2! :-)  

 BTW, I not sure if you anytime in future will see threads in 
official PostgreSQL and if you spending time on relevant things (IMHO).

Karel








Re: [HACKERS] Using Threads?

2001-02-05 Thread Myron Scott

I have put a new version of my multi-threaded
postgresql experiment at

http://www.sacadia.com/mtpg.html

This one actually works.  I have added a server
based on omniORB, a CORBA 2.3 ORB from ATT.  It
   is much smaller than TAO and uses the thread per
connection model.  I haven't added the java side
of the JNI interface yet but the C++ side is there.

It's still not stable but it is much better than
the last.

Myron Scott
[EMAIL PROTECTED]







Re: [HACKERS] Using Threads?

2001-01-01 Thread Myron Scott

For anyone interested,

I have posted my multi-threaded version of PostgreSQL here.

http://www.sacadia.com/mtpg.html

It is based on 7.0.2 and the TAO CORBA ORB which is here.

http://www.cs.wustl.edu/~schmidt/TAO.html

Myron Scott
[EMAIL PROTECTED]





Re: [HACKERS] Using Threads?

2001-01-01 Thread Karel Zak


On Mon, 1 Jan 2001, Myron Scott wrote:

 For anyone interested,
 
 I have posted my multi-threaded version of PostgreSQL here.
 
 http://www.sacadia.com/mtpg.html

 How you solve locks? Via original IPC or you rewrite it to mutex (etc).

Karel  




Re: [HACKERS] Using Threads?

2001-01-01 Thread Myron Scott

spinlocks rewritten to mutex_
locktable uses sema_
some cond_ in bufmgr.c

Myron


Karel Zak wrote:

 On Mon, 1 Jan 2001, Myron Scott wrote:
 
 
 For anyone interested,
 
 I have posted my multi-threaded version of PostgreSQL here.
 
 http://www.sacadia.com/mtpg.html
 
 
  How you solve locks? Via original IPC or you rewrite it to mutex (etc).
 
   Karel  




Re: [HACKERS] Using Threads?

2000-12-08 Thread Bruce Momjian

 Adam Haberlach writes:
  Typically (on a well-written OS, at least), the spawning of a thread
  is much cheaper then the creation of a new process (via fork()).
 
 This would be well worth testing on some representative sample
 systems.
 
 Within the past year and a half at one of my gigs some coworkers did
 tests on various platforms (Irix, Solaris, a few variations of Linux
 and *BSDs) and concluded that in fact the threads implementations were
 often *slower* than using processes for moving and distributing the
 sorts of data that they were playing with.
 
 With copy-on-write and interprocess pipes that are roughly equivalent
 to memcpy() speeds it was determined for that application that the
 best way to split up tasks was fork()ing and dup().

This brings up a good point.   Threads are mostly useful when you have
multiple processes that need to share lots of data, and the interprocess
overhead is excessive.  Because we already have that shared memory area,
this benefit of threads doesn't buy us much.  We sort of already have
done the _shared_ part, and the addition of sharing our data pages is
not much of a win.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] Using Threads?

2000-12-08 Thread Bruce Momjian

 Bruce Guenter [EMAIL PROTECTED] writes:
  [ some very interesting datapoints ]
 
  So, forking a process with lots of data is expensive.  However, most of
  the PostgreSQL data is in a SysV IPC shared memory segment, which
  shouldn't affect the fork numbers.
 
 I believe (but don't have numbers to prove it) that most of the present
 backend startup time has *nothing* to do with thread vs process
 overhead.  Rather, the primary startup cost has to do with initializing
 datastructures, particularly the system-catalog caches.  A backend isn't
 going to get much real work done until it's slurped in a useful amount
 of catalog cache --- for example, until it's got the cache entries for
 pg_class and the indexes thereon, it's not going to accomplish anything
 at all.
 
 Switching to a thread model wouldn't help this cost a bit, unless
 we also switch to a shared cache model.  That's not necessarily a win
 when you consider the increased costs associated with cross-backend
 or cross-thread synchronization needed to access or update the cache.
 And if it *is* a win, we could get most of the same benefit in the
 multiple-process model by keeping the cache in shared memory.

Of course, we would also have to know which database was being used
next.  Each database's system catalog can be different.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] Using Threads?

2000-12-05 Thread Bruce Guenter

On Tue, Dec 05, 2000 at 10:07:37AM +0100, Zeugswetter Andreas SB wrote:
  And using the following program for timing thread creation 
  and cleanup:
  
  #include pthread.h
  
  threadfn() { pthread_exit(0); }
 
 I think you would mainly need to test how the system behaves, if 
 the threads and processes actually do some work in parallel, like:
 
 threadfn() {int i; for (i=0; i1000;) {i++}; pthread_exit(0); }

The purpose of the benchmark was to time how long it took to create and
destroy a process or thread, nothing more.  It was not creating
processes in parallel for precisely that reason.  The point in dispute
was that threads took much less time to create than processes.

 In a good thread implementation 1 parallel processes tend to get way less 
 cpu than 1 parallel threads, making threads optimal for the very many clients 
case
 (like  3000).

Why do you believe this?  In the "classical" thread implementation, each
process would get the same amount of CPU, no matter how many threads was
running in it.  That would mean that many parallel processes would get
more CPU in total than many threads in one process.
-- 
Bruce Guenter [EMAIL PROTECTED]   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Using Threads?

2000-12-05 Thread markw

I have been watching this thread vs non-threaded discussion and am completely with the
process-only crew for a couple reasons, but lets look at a few things:

The process vs threads benchmark which showed 160us vs 120us, only did the process
creation, not the delayed hit of the "copy on write" pages in the new process. Just 
forking
is not as simple as forking, once the forked process starts to work, memory that is not
explicitly shared is copied to the new process once it is modified. So this is a hit,
possibly a big hit. Threads are far more efficient, it really is hard to debate.

I can see a number of reasons why a multithreaded version of a database would be good.
Asynchronous I/O perhaps, or even parallel joins, but with that being said, I think
stability and work are by far the governing factors. Introducing multiple threads into 
a
non-multithreaded code base invariably breaks everything.

So, we want to weight the possible performance gains of multithreads vs all the work 
and
effort to make them work reliably. The question is fundamentally, where are we 
spending our
time? If we are spending our time in context switches, then multithreading may be a 
way of
reducing this, however, in all the applications I have built with postgres, it is 
always
(like most databases) I/O bound or bound by computation.

I think the benefits of rewriting code to be multithreaded are seldom worth the work 
and
the risks, unless there is a clear advantage to do so. I think most would agree that 
any
increase in performance gained by going multithreaded would be minimal, and the amount 
of
work to do so would be  great.




Re: [HACKERS] Using Threads?

2000-12-05 Thread Bruce Guenter

On Tue, Dec 05, 2000 at 02:52:48PM -0500, Tom Lane wrote:
 There aren't going to be all that many data pages needing the COW
 treatment, because the postmaster uses very little data space of its
 own.  I think this would become an issue if we tried to have the
 postmaster pre-cache catalog information for backends, however (see
 my post elsewhere in this thread).

Would that pre-cached data not be placed in a SHM segment?  Such
segments don't do COW, so this would be a non-issue.
-- 
Bruce Guenter [EMAIL PROTECTED]   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Using Threads?

2000-12-04 Thread Myron Scott

I maybe wrong but I think that PGSQL is not threaded mostly due to
historical reasons.  It looks to me like the source has developed over
time where much of the source is not reentrant with many global variables
throughout.  In addition, the parser is generated by flex which
can be made to generate reentrant code but is still not thread safe b/c
global variables are used.

That being said, I experimented with the 7.0.2 source and came up with a
multithreaded backend for PGSQL which uses Solaris Threads. It seems to
work, but I drifted very far from the original source.  I
had to hack flex to generate threadsafe code as well.  I use it as a
linked library with my own fe-be protocol. This ended up being much much
more than I bargained for and looking back would probably not have tried
had I known any better.


Myron Scott


On Mon, 27 Nov 2000, Junfeng Zhang wrote:

 Hello all,
 
 I am new to postgreSQL. When I read the documents, I find out the Postmaster
 daemon actual spawns a new backend server process to serve a new client
 request. Why not use threads instead? Is that just for a historical reason,
 or some performance/implementation concern?
 
 Thank you very much.
 Junfeng
 




Re: [HACKERS] Using Threads?

2000-12-04 Thread The Hermit Hacker

On Mon, 27 Nov 2000, Junfeng Zhang wrote:

 Hello all,
 
 I am new to postgreSQL. When I read the documents, I find out the
 Postmaster daemon actual spawns a new backend server process to serve
 a new client request. Why not use threads instead? Is that just for a
 historical reason, or some performance/implementation concern?

Several reasons, 'historical' probably being the strongest right now
... since PostgreSQL was never designed for threading, its about as
'un-thread-safe' as they come, and cleaning that up will/would be a
complete nightmare (should eventually be done, mind you) ...

The other is stability ... right now, if one backend drops away, for
whatever reason, it doesn't take down the whole system ... if you ran
things as one process, and that one process died, you just lost your whole
system ...





Re: [HACKERS] Using Threads?

2000-12-04 Thread The Hermit Hacker


if we were to do this in steps, I beliee that one of the major problems
irght now is that we have global variables up the wazoo ... my
'thread-awareness' is limited, as I've yet to use them, so excuse my
ignorance ... if we got patches that cleaned up the code in stages, moving
towards a cleaner code base, then we could get it into the main source
tree ... ?

 On Mon, 4 Dec 2000, Ross J. Reedstrom wrote:

 Myron - 
 Putting aside the fork/threads discussion for a moment (the reasons,
 both historical and other, such as inter-backend protection, are well
 covered in the archives), the work you did sounds like an interesting
 experiment in code redesign. Would you be willing to release the hacked
 code somewhere for others to learn from? Hacking flex to generate
 thread-safe code is of itself interesting, and the question about PG and
 threads comes up so often, that an example of why it's not a simple task
 would be useful.
 
 Ross
 
 On Mon, Dec 04, 2000 at 12:20:20AM -0800, Myron Scott wrote:
  I maybe wrong but I think that PGSQL is not threaded mostly due to
  historical reasons.  It looks to me like the source has developed over
  time where much of the source is not reentrant with many global variables
  throughout.  In addition, the parser is generated by flex which
  can be made to generate reentrant code but is still not thread safe b/c
  global variables are used.
  
  That being said, I experimented with the 7.0.2 source and came up with a
  multithreaded backend for PGSQL which uses Solaris Threads. It seems to
  work, but I drifted very far from the original source.  I
  had to hack flex to generate threadsafe code as well.  I use it as a
  linked library with my own fe-be protocol. This ended up being much much
  more than I bargained for and looking back would probably not have tried
  had I known any better.
  
  
  Myron Scott
  
 

Marc G. Fournier   ICQ#7615664   IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: [EMAIL PROTECTED]   secondary: scrappy@{freebsd|postgresql}.org 




Re: [HACKERS] Using Threads?

2000-12-04 Thread Tom Lane

The Hermit Hacker [EMAIL PROTECTED] writes:
 Why not use threads instead? Is that just for a
 historical reason, or some performance/implementation concern?

 Several reasons, 'historical' probably being the strongest right now
 ... since PostgreSQL was never designed for threading, its about as
 'un-thread-safe' as they come, and cleaning that up will/would be a
 complete nightmare (should eventually be done, mind you) ...

 The other is stability ... right now, if one backend drops away, for
 whatever reason, it doesn't take down the whole system ... if you ran
 things as one process, and that one process died, you just lost your whole
 system ...

Portability is another big reason --- using threads would create lots
of portability headaches for platforms that had no threads or an
incompatible threads library.  (Not to mention buggy threads libraries,
not-quite-thread-safe libc routines, yadda yadda.)

The amount of work required looks far out of proportion to the payoff...

regards, tom lane



Re: [HACKERS] Using Threads?

2000-12-04 Thread Junfeng Zhang

All the major operating systems should have POSIX threads implemented.
Actually this can be configurable--multithreads or one thread.

Thread-only server is unsafe, I agree. Maybe the following model can be a
little better. Several servers, each is multi-threaded. Every server can
support a maximum number of requests simultaneously. If anything bad
happends, it is limited to that server. 

The cons side of processes model is not the startup time. It is about
kernel resource and context-switch cost. Processes consume much more
kernel resource than threads, and have a much higher cost for context
switch. The scalability of threads model is much better than that of
processes model.

-Junfeng

On Mon, 4 Dec 2000, Thomas Lockhart wrote:

  I am new to postgreSQL. When I read the documents, I find out the Postmaster
  daemon actual spawns a new backend server process to serve a new client
  request. Why not use threads instead? Is that just for a historical reason,
  or some performance/implementation concern?
 
 Both. Not all systems supported by PostgreSQL have a standards-compliant
 threading implementation (even more true for the systems PostgreSQL has
 supported over the years).
 
 But there are performance and reliability considerations too. A
 thread-only server is likely more brittle than a process-per-client
 implementation, since all threads share the same address space.
 Corruption in one server might more easily propagate to other servers.
 
 The time to start a backend is quite often small compared to the time
 required for a complete session, so imho the differences in absolute
 speed are not generally significant.
 
- Thomas
 




Re: [HACKERS] Using Threads?

2000-12-04 Thread Adam Haberlach

On Mon, Dec 04, 2000 at 02:28:10PM -0600, Bruce Guenter wrote:
 On Mon, Nov 27, 2000 at 11:42:24PM -0600, Junfeng Zhang wrote:
  I am new to postgreSQL. When I read the documents, I find out the Postmaster
  daemon actual spawns a new backend server process to serve a new client
  request. Why not use threads instead? Is that just for a historical reason,
  or some performance/implementation concern?
 
 Once all the questions regarding "why not" have been answered, it would
 be good to also ask "why use threads?"  Do they simplify the code?  Do
 they offer significant performance or efficiency gains?  What do they
 give, other than being buzzword compliant?

Typically (on a well-written OS, at least), the spawning of a thread
is much cheaper then the creation of a new process (via fork()).  Also,
since everything in a group of threads (I'll call 'em a team) shares the
same address space, there can be some memory overhead savings.

-- 
Adam Haberlach   |"California's the big burrito, Texas is the big
[EMAIL PROTECTED]  | taco ... and following that theme, Florida is
http://www.newsnipple.com| the big tamale ... and the only tamale that 
'88 EX500| counts any more." -- Dan Rather 



Re: [HACKERS] Using Threads?

2000-12-04 Thread Dan Lyke

Adam Haberlach writes:
 Typically (on a well-written OS, at least), the spawning of a thread
 is much cheaper then the creation of a new process (via fork()).

This would be well worth testing on some representative sample
systems.

Within the past year and a half at one of my gigs some coworkers did
tests on various platforms (Irix, Solaris, a few variations of Linux
and *BSDs) and concluded that in fact the threads implementations were
often *slower* than using processes for moving and distributing the
sorts of data that they were playing with.

With copy-on-write and interprocess pipes that are roughly equivalent
to memcpy() speeds it was determined for that application that the
best way to split up tasks was fork()ing and dup().

As always, your mileage will vary, but the one thing that consistently
amazes me on the Un*x like operating systems is that usually the
programmatically simplest way to implement something has been
optimized all to heck.

A lesson that comes hard to those of us who grew up on MS systems.

Dan



Re: [HACKERS] Using Threads?

2000-12-04 Thread Bruce Guenter

On Mon, Dec 04, 2000 at 03:17:00PM -0800, Adam Haberlach wrote:
   Typically (on a well-written OS, at least), the spawning of a thread
 is much cheaper then the creation of a new process (via fork()).

Unless I'm mistaken, the back-end is only forked when starting a new
connection, in which case the latency of doing the initial TCP tri-state
and start-up queries is much larger than any process creation cost.  On
Linux 2.2.16 on a 500MHz PIII, I can do the fork/exit/wait sequence in
about 164us.  On the same server, I can make/break a PostgreSQL
connection in about 19,000us (with 0% CPU idle, about 30% CPU system).
Even if we can manage to get a thread for free, and assume that the fork
from postmaster takes more than 164us, it won't make a big difference
once the other latencies are worked out.

 Also, since everything in a group of threads (I'll call 'em a team)

Actually, you call them a process.  That is the textbook definition.

 shares the
 same address space, there can be some memory overhead savings.

Only slightly.  All of the executable and libraries should already be
shared, as will all non-modified data.  If the data is modified by the
threads, you'll need seperate copies for each thread anyways, so the net
difference is small.

I'm not denying there would be a difference.  Compared to seperate
processes, threads are more efficient.  Doing a context switch between
threads means there is no PTE invalidations, which makes them quicker
than between processes.  Creation would be a bit faster due to just
linking in the VM to a new thread rather than marking it all as COW.
The memory savings would come from reduced fragmentation of the modified
data (if you have 1 byte modified on each of 100 pages, the thread would
grow by a few K, compared to 400K for processes).  I'm simply arguing
that the differences don't appear to be significant compared to the
other costs involved.
-- 
Bruce Guenter [EMAIL PROTECTED]   http://em.ca/~bruceg/

 PGP signature


RE: [HACKERS] Using Threads?

2000-12-04 Thread Matthew

*snip*
  
  Once all the questions regarding "why not" have been answered, it would
  be good to also ask "why use threads?"  Do they simplify the code?  Do
  they offer significant performance or efficiency gains?  What do they
  give, other than being buzzword compliant?
 
The primary advantage that I see is that a single postgres process
can benefit from multiple processors. I see little advantage to using thread
for client connections.



Re: [HACKERS] Using Threads?

2000-12-04 Thread Bruce Guenter

On Mon, Dec 04, 2000 at 02:30:31PM -0800, Dan Lyke wrote:
 Adam Haberlach writes:
  Typically (on a well-written OS, at least), the spawning of a thread
  is much cheaper then the creation of a new process (via fork()).
 This would be well worth testing on some representative sample
 systems.

Using the following program for timing process creation and cleanup:

main() {
  int i;
  int pid;
  for (i=0; i10; ++i) {
pid=fork();
if(pid==-1) exit(1);
if(!pid) _exit(0);
waitpid(pid,0,0);
  }
  exit(0);
} 

And using the following program for timing thread creation and cleanup:

#include pthread.h

threadfn() { pthread_exit(0); }

main() {
  int i;
  pthread_t thread;
  for (i=0; i10; ++i) {
if (pthread_create(thread, 0, threadfn, 0)) exit(1);
if (pthread_join(thread, 0)) exit(1);
  }
  exit(0);
} 

On a relatively unloaded 500MHz PIII running Linux 2.2, the fork test
program took a minimum of 16.71 seconds to run (167us per
fork/exit/wait), and the thread test program took a minimum of 12.10
seconds to run (121us per pthread_create/exit/join).  I use the minimums
because those would be the runs where the tasks were least interfered
with by other tasks.  This amounts to a roughly 25% speed improvement
for threads over processes, for the null-process case.

If I add the following lines before the for loop:
  char* m;
  m=malloc(1024*1024);
  memset(m,0,1024,1024);
The cost for doing the fork balloons to 240us, whereas the cost for
doing the thread is constant.  So, the cost of marking the pages as COW
is quite significant (using those numbers, 73us/MB).

So, forking a process with lots of data is expensive.  However, most of
the PostgreSQL data is in a SysV IPC shared memory segment, which
shouldn't affect the fork numbers.
-- 
Bruce Guenter [EMAIL PROTECTED]   http://em.ca/~bruceg/

 PGP signature


Re: [HACKERS] Using Threads?

2000-12-04 Thread Lamar Owen

Matthew wrote:
 The primary advantage that I see is that a single postgres process
 can benefit from multiple processors. I see little advantage to using thread
 for client connections.

Multiprocessors best benefit multiple backends.  And the current forked
model lends itself admirably to SMP.

And I say that even after using a multithreaded webserver (AOLserver)
for three and a half years.  Of course, AOLserver also sanely uses the
multi process PostgreSQL backends in a pooled fashion, but that's beside
the point.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11



Re: [HACKERS] Using Threads?

2000-12-04 Thread Myron Scott

I would love to distribute this code to anybody who wants it.  Any
suggestions for a good place?  However, calling the
work a code redesign is a bit generous.  This was more like a
brute force hack.  I just moved all the connection related global
variables to
a thread local "environment variable" and bypassed much of the postmaster
code.  

I did this so I could port my app which was originally designed for
Oracle OCI and Java.  My app uses very few SQL statements but uses them
over and over.  I wanted true prepared statements linked to Java with JNI.
I got both as well as batched transaction writes ( which was more relevant
before WAL).  

In my situation, threads seemed much more flexible to implement, and I
probably could
not have done the port without it.


Myron 

On Mon, 4 Dec 2000, Ross J. Reedstrom wrote:

 Myron - 
 Putting aside the fork/threads discussion for a moment (the reasons,
 both historical and other, such as inter-backend protection, are well
 covered in the archives), the work you did sounds like an interesting
 experiment in code redesign. Would you be willing to release the hacked
 code somewhere for others to learn from? Hacking flex to generate
 thread-safe code is of itself interesting, and the question about PG and
 threads comes up so often, that an example of why it's not a simple task
 would be useful.
 
 Ross
 




Re: [HACKERS] Using Threads?

2000-12-04 Thread Tom Lane

Bruce Guenter [EMAIL PROTECTED] writes:
 [ some very interesting datapoints ]

 So, forking a process with lots of data is expensive.  However, most of
 the PostgreSQL data is in a SysV IPC shared memory segment, which
 shouldn't affect the fork numbers.

I believe (but don't have numbers to prove it) that most of the present
backend startup time has *nothing* to do with thread vs process
overhead.  Rather, the primary startup cost has to do with initializing
datastructures, particularly the system-catalog caches.  A backend isn't
going to get much real work done until it's slurped in a useful amount
of catalog cache --- for example, until it's got the cache entries for
pg_class and the indexes thereon, it's not going to accomplish anything
at all.

Switching to a thread model wouldn't help this cost a bit, unless
we also switch to a shared cache model.  That's not necessarily a win
when you consider the increased costs associated with cross-backend
or cross-thread synchronization needed to access or update the cache.
And if it *is* a win, we could get most of the same benefit in the
multiple-process model by keeping the cache in shared memory.

The reason that a new backend has to do all this setup work for itself,
rather than inheriting preloaded cache entries via fork/copy-on-write
from the postmaster, is that the postmaster isn't part of the ring of
processes that can access the database files directly.  That was done
originally for robustness reasons: since the PM doesn't have to deal
with database access, cache invalidation messages, etc etc yadda yadda,
it is far simpler and less likely to crash than a real backend.  If we
conclude that shared syscache is not a reasonable idea, it might be
interesting to look into making the PM into a full-fledged backend
that maintains a basic set of cache entries, so that these entries are
immediately available to new backends.  But we'd have to take a real
hard look at the implications for system robustness/crash recovery.

In any case I think we're a long way away from the point where switching
to threads would make a big difference in connection startup time.

regards, tom lane



Re: [HACKERS] Using Threads?

2000-12-04 Thread Tom Samplonius


On Mon, 4 Dec 2000, Junfeng Zhang wrote:

 All the major operating systems should have POSIX threads implemented.
 Actually this can be configurable--multithreads or one thread.

  I don't understand this.  The OS can be configured for one thread?  How
would that be any of use?

 Thread-only server is unsafe, I agree. Maybe the following model can be a
 little better. Several servers, each is multi-threaded. Every server can
 support a maximum number of requests simultaneously. If anything bad
 happends, it is limited to that server. 

  There is no difference.  If anything bad happens with the current
multi-process server, all the postgres backends shutdown because the
shared memory may be corrupted.

 The cons side of processes model is not the startup time. It is about
 kernel resource and context-switch cost. Processes consume much more
 kernel resource than threads, and have a much higher cost for context
 switch. The scalability of threads model is much better than that of
 processes model.

  What kernel resources do a process use?  There is some VM mapping
overhead, a process table entry, and a file descriptor table.  It is
possible to support thousands of processes today.  For instance,
ftp.freesoftware.com supports up to 5000 FTP connections using a slightly
modified ftpd (doesn't use inetd anymore).  That means with 5000 users
connected, that works out to 5000 processes active.  Amazing but true.

  Some OSes (Linux is the main one) implement threads as pseudo processes.
Linux threads are processes with a shared address space and file
descriptor table.

  Context switch cost for threads can be lower if you are switching to a
thread in the same process.  That of course assumes that all context
switches will occur within the same process, or the Linux
everything-is-a-process model isn't used.

 -Junfeng

Tom