Re: [HACKERS] PostgreSQL pre-fork speedup
I ran the new Pgpool-1.2.2 and it was a bit faster on the TCP but still slower than on UNIX socket. I used the same script as before. TCP Socket (Pgpool 1.2.0) -- 2.39 sec TCP Socket (Pgpool 1.2.2) -- 0.80 sec 0.80 sec 0.79 sec UNIX Socket (Pgpool 1.2.2) --- 0.026 sec 0.027 sec 0.027 sec Direct TCP connection (no pgpool) - 0.16 sec 0.15 sec 0.16 sec PgPool on TCP is still slower than direct connection but much faster than v1.2. Any other areas that can be improved? Regards, __ Do you Yahoo!? Yahoo! Movies - Buy advance tickets for 'Shrek 2' http://movies.yahoo.com/showtimes/movie?mid=1808405861 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL pre-fork speedup
Yes, I realize it's a bit old but I just wanted to make a small point that forking is slower. It's funny you should ask because thread creation on Linux has in fact improved over process creation much more in 2.4 kernel. Benchmark at IBM shows Linux 2.4 thread creation is 30x faster than process creation. Process creation on Windows 2000 is about twice longer than process creation on Linux. This means forking on Win32 will be 2x slower! See 2002 benchmark below: http://www-106.ibm.com/developerworks/linux/library/l-rt7/?Opent=grl,l=252,p=mgth Cheers, --- Andrew Dunstan [EMAIL PROTECTED] wrote: sdv mailer said: Forking is expensive on many systems. Linux is a bit better but still expensive compared to threads. On Windows, creating process is much more expensive than on Linux. Check this benchmark: http://cs.nmu.edu/~randy/Research/Papers/Scheduler/understanding.html Forking shouldn't be taken lightly as free thing. There are pros and cons. The general trend is going towards threads, but that's a different issue. This article shows a 3x speedup for thread creation over fork(), not the numbers you have quoted. Furthermore, it talks about Linux kernel 2.0.30. Do you know how old that is? The paper itself comes from Linux Journal, January 1999, according to the author's web site. Argument will get you nowhere - if you want it done then do it and prove everyone wrong. cheers andrew ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PostgreSQL pre-fork speedup
Tatsuo, I did some benchmark on my Linux box (AMD 1.2Ghz, 256MB, Fedora Core 1 Linux 2.4.20-8) using Pgpool 1.2 and PostgreSQL 7.4. I ran the benchmark script repeatedly (10+ times each). I get 5x faster using Pgpool on UNIX socket, which is encouraging. This shows pre-fork does speed things up. However, when I tried TCP socket, Pgpool was actually slower by 15x !! Perhaps you can clarify why the TCP socket is so much slower? PHP connecting on UNIX socket - Without pgpool: 0.144 sec With pgpool : 0.027 sec PHP connecting on TCP Socket Without pgpool: 0.152 sec With pgpool : 2.39 sec ?php $time_start = getmicrotime(); for ($i = 0; $i 20; $i++) { // With pgpool on UNIX socket //$DBH = pg_connect('dbname=test1 port= user=postgres'); // With pgpool on TCP socket //$DBH = pg_connect('dbname=test1 host=127.0.0.1 port= user=postgres'); // Without pgpool on UNIX socket //$DBH = pg_connect('dbname=test1 user=postgres'); // Without pgpool on TCP socket //$DBH = pg_connect('dbname=test1 host=127.0.0.1 user=postgres'); $Res = pg_exec($DBH, 'SELECT 1'); pg_close($DBH); } $Time = getmicrotime() - $time_start; ? I only changed the pgpool configuration where it says: allow_inet_domain_socket = 1 __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL pre-fork speedup
Ok, I did some benchmark on my Linux box (AMD 1.2Ghz, 256MB, Fedora Core 1 Linux 2.4.20-8) using Pgpool 1.2 and PostgreSQL 7.4. I ran the benchmark script repeatedly (10+ times each). I get 5x faster using Pgpool on UNIX socket, which is encouraging. This shows pre-fork does speed things up even with the overhead incurred by the proxy. However, when I tried TCP socket, Pgpool was actually slower by 15x !! Tatsuo, perhaps you can clarify why the TCP socket is so much slower? PHP connecting on UNIX socket - Without pgpool: 0.144 sec With pgpool : 0.027 sec PHP connecting on TCP Socket Without pgpool: 0.152 sec With pgpool : 2.39 sec ?php $time_start = getmicrotime(); for ($i = 0; $i 20; $i++) { // With pgpool on UNIX socket //$DBH = pg_connect('dbname=test1 port= user=postgres'); // With pgpool on TCP socket //$DBH = pg_connect('dbname=test1 host=127.0.0.1 port= user=postgres'); // Without pgpool on UNIX socket //$DBH = pg_connect('dbname=test1 user=postgres'); // Without pgpool on TCP socket //$DBH = pg_connect('dbname=test1 host=127.0.0.1 user=postgres'); $Res = pg_exec($DBH, 'SELECT 1'); pg_close($DBH); } $Time = getmicrotime() - $time_start; ? I only changed the pgpool configuration where it says: allow_inet_domain_socket = 1 __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] PostgreSQL pre-fork speedup
Pgpool connects to PostgreSQL on UNIX socket. I also ran on TCP socket but there is no significant difference if I recall correctly due to the inherent nature of connection pooling or pre-fork technology. ;-) --- Rod Taylor [EMAIL PROTECTED] wrote: However, when I tried TCP socket, Pgpool was actually slower by 15x !! Perhaps you can clarify why the TCP socket is so much slower? How did you have pgpool configured to connect to the database? Domain socket or tcpip? __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PostgreSQL pre-fork speedup
Hi Bruce, Sorry for the confusion because Rod asked a question and I answered too quickly. This is what I mean. 15x Slower: --- Client --TCP-- PgPool --UNIX-- PostgreSQL Client --TCP-- PgPool --TCP-- PostgreSQL 5x Faster: -- Client --UNIX-- PgPool --UNIX-- PostgreSQL Client --UNIX-- PgPool --TCP-- PostgreSQL Hope this helps! Pgpool speeds up connection time by 5x with UNIX socket due to pre-fork and connection pooling. However, pgpool slows down by 15x under TCP socket for some unknown reason. __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] PostgreSQL pre-fork speedup
No SSL. No authentication either. Just friendly handshakes. __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] PostgreSQL pre-fork speedup
I compared against both TCP and UNIX direct connections. No SSL, no authentication. See benchmark results posted below again: Direct -- 0.144 sec. Client --UNIX-- PG 0.152 sec. Client --TCP-- PG 5x Faster - 0.027 sec. Client --UNIX-- Pgpool --UNIX-- PG 0.028 sec. Client --UNIX-- Pgpool --TCP-- PG 15x Slower -- 2.39 sec. Client --TCP-- Pgpool --UNIX-- PG 2.40 sec. Client --TCP-- Pgpool --TCP-- PG __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL pre-fork speedup
Nope. I commented out that block of code at 372 and no difference. __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] PostgreSQL pre-fork speedup
Tom, You're correct about the test measuring a hot backend and not forking. How much exactly is the bulk of the startup done by cache initialization relative to the forking? What would be the impact on Win32 knowing that process creation is twice as slow than on Linux? __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PostgreSQL pre-fork speedup
I don't think I can volunteer on this end as I am already actively volunteering for another open project. I was hoping someone could take up on this since one of the last threads mentionned we don't have something substantial to present for 7.5 if June 1 is dateline for code freeze. Pre-fork came to mind. :-) As for proof of concept, I think pgpool from Tatsuo Ishii is a good indication that pre-fork works. I'll try to see if I can generate some benchmarks using pgpool on my Linux. PgPool is a server-side connection pool/load balancer/replicator that implements pre-fork but because it acts as a proxy there is 7% to 15% overhead according to his README file. http://www.mail-archive.com/[EMAIL PROTECTED]/msg44082.html --- Andrew Dunstan [EMAIL PROTECTED] wrote: sdv mailer wrote: [snip] Pre-fork will give MySQL one less argument to throw at PostgreSQL. I think optimizing is this area will speed up the general case for everyone rather than optimizing a feature that affects 10% of the users. On top of that, it will make a strong marketing case because forking will no longer become a speed issue when compared to MySQL. So when can we expect to see your proof of concept code and benchmarks to show the speedup achieved? cheers andrew ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] PostgreSQL pre-fork speedup
Hi, I know the issue of pre-fork PostgreSQL has been discussed previously. Someone mentionned pre-fork can be implemented when schemas become available in PostgreSQL because there will be less of the need to run multiple databases. I think Oracle 7 uses pre-forking and it helps speed up the startup time considerably. Often, there are cases where connection pooling or persistent connection cannot be used efficiently (e.g. replicated or splitted databases over hundreds of machines or where persistent connection opens up too many idle connections). Instead, there's a big need to create a new connection on every query and with PostgreSQL needing to fork on every incoming connection can be quite slow. Any chance of that happening for 7.5? Thanks. __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PostgreSQL pre-fork speedup
I'm talking about connecting to multiple database servers on separate machines. Schemas don't apply here. How much work would it take to make a pre-fork smart enough to open different databases on incoming connection? How much of it can be modeled after Apache? __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] PostgreSQL pre-fork speedup
Pre-fork does not equal to idle connections! Pre-fork scales with database load where as persistent connections scales with webserver load. A web server that is heavily loaded but not necessarily performing a lot of database activity will spawn hundreds of idle database connections using persistent connection. With pre-fork, you can potentially lower this down to even 10 open connections. Forking is quite fast on Linux but creating a new process is still 10x more expensive than creating a thread and is even worse on Win32 platform. CPU load goes up because the OS needs to allocate/deallocate memory making it difficult to get a steady state resource consumption. More importantly, solving the forking delay will have a big impact on people's mind who have been given the impression that forking is very very slow. Here's what one site has to say about PostgreSQL's forking: http://www.geocities.com/mailsoftware42/db/ Postgres forks on every incoming connection - and the forking process and backend setup is a bit slow, but one can speed up PostgreSQL by coding things as stored procedures Pre-fork will give MySQL one less argument to throw at PostgreSQL. I think optimizing is this area will speed up the general case for everyone rather than optimizing a feature that affects 10% of the users. On top of that, it will make a strong marketing case because forking will no longer become a speed issue when compared to MySQL. __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] PostgreSQL pre-fork speedup
I've already tried pooling (SQLRelay) and persistent connection (PHP). They may work for other people but they do not work for me. I have already separated static from database driven codes but you can never balance web server load with database server load. Pre-fork scales with database load and not with web server load. This point is crucial. Most people paying $5.99/mo for web hosting don't have access to persistent connection or connection pooling under PHP. Maybe this is why MySQL is favored among them. I'm not saying this is my case, but there is a general need for speedier connections. If you can satisfy the needs of the mass, then you practically won their vote. Currently MySQL connects 10x faster than PostgreSQL. See my last benchmark. __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL pre-fork speedup
Forking is expensive on many systems. Linux is a bit better but still expensive compared to threads. On Windows, creating process is much more expensive than on Linux. Check this benchmark: http://cs.nmu.edu/~randy/Research/Papers/Scheduler/understanding.html Forking shouldn't be taken lightly as free thing. There are pros and cons. The general trend is going towards threads, but that's a different issue. --- scott.marlowe [EMAIL PROTECTED] wrote: On Wed, 5 May 2004, sdv mailer wrote: Forking is quite fast on Linux but creating a new process is still 10x more expensive than creating a thread and is even worse on Win32 platform. CPU load goes up because the OS needs to allocate/deallocate memory making it difficult to get a steady state resource consumption. Just a nit to pick here. In Linux, the difference between forking and spawning a new thread is almost nothing. Definitely less than a factor of 2, and most assuredly less than the quoted factor of 10 here. The fact that windows has a heavy process / lightweight thread design means little to me, since I'll likely never deploy a production postgresql server on it that needs to handle any serious load. __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PostgreSQL pre-fork speedup
I'll pretend I didn't see that last comment on Windows. I wouldn't want to disappoint the users who are eagerly expecting the Win32 port to complete including myself. ;-) Having said that, I think it's more the reason to get a working pre-fork for Win32. Don't you think so? --- scott.marlowe [EMAIL PROTECTED] wrote: On Wed, 5 May 2004, sdv mailer wrote: Forking is quite fast on Linux but creating a new process is still 10x more expensive than creating a thread and is even worse on Win32 platform. CPU load goes up because the OS needs to allocate/deallocate memory making it difficult to get a steady state resource consumption. Just a nit to pick here. In Linux, the difference between forking and spawning a new thread is almost nothing. Definitely less than a factor of 2, and most assuredly less than the quoted factor of 10 here. The fact that windows has a heavy process / lightweight thread design means little to me, since I'll likely never deploy a production postgresql server on it that needs to handle any serious load. __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] PostgreSQL pre-fork speedup
We used to run persistent connection until the DB servers got maxed out because of too many idle connections sucking up all the memory. Web servers run different loads than database servers and persistent connections are notorious for crashing your DB. Connection pooling (eg. SQLRelay) didn't work either because we needed to connect to hundreds of DB servers from each web server. Imagine having 200+ open connections on the web server and how many more of these connections remain idle. The situation gets worse when you multiply by an even greater number of web servers connected to all these database servers. Do the math! We're talking large server farm here, not 2 or 3 machines. Saving that X ms can be substantial for large number of simultaneous connections and shouldn't be neglected, otherwise why have persistent connection or connection pooling in the first place. Imagine every query uses up that X ms of time just for connecting/forking. It adds up to a lot from experience. I think pre-forking can be beneficial and is a lot simpler than to rewrite a multi-threaded DB server. Pre-forking would not consume as much memory as persistent connections because it scales with the database load and NOT with the web server load. I'm guessing pre-forking will benefit more on systems where launching a new process is expensive (Win32, certain UNIXes). Here's a snippet from one of the Apache's conferences: Traditionally TCP/IP servers fork a new child to handle incoming requests from clients. However, in the situation of a busy web site, the overhead of forking a huge number of children will simply suffocate the server. As a consequence, Apache uses a different technique. It forks a fixed number of children right from the beginning. The children service incoming requests independently, using different address spaces. Apache can dynamically control the number of children it forks based on current load. This design has worked well and proved to be both reliable and efficient; one of its best features is that the server can survive the death of children and is also reliable. It is also more efficient than the canonical UNIX model of forking a new child for every request. Beside solving my own problems, having a pre-fork solution will benefit PostgreSQL too. MySQL is reputated for having a fast connection and people know it because you cannot avoid simple queries (e.g. counters, session retrieval, etc). The truth of the matter is many people still operate on connect/query/disconnect model running simple queries and if you can satisfy these people then it can be a big marketing win for PostgreSQL. Many web hosting companies out there don't allow persistent connection, which is where MySQL shines. Over and over again, we hear people say how MySQL is fast for the Web because it can connect and execute simple queries quickly. Take for instance http://www-css.fnal.gov/dsg/external/freeware/pgsql-vs-mysql.html MySQL handles connections very fast, thus making it suitable to use MySQL for Web - if you have hundreds of CGIs connecting/disconnecting all the time you'd like to avoid long startup procedures. and http://www-css.fnal.gov/dsg/external/freeware/Repl_mysql_vs_psql.html MySQL handles connections and simple SELECTs very fast. Likely, PostgreSQL is just as fast but if people don't see that on the first try running a simple query, then MySQL already won the war when it comes to speed. Other benchmark I came across: http://www.randomnetworks.com/joseph/blog/?eid=101 __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] PostgreSQL pre-fork speedup
Forking consumes a large amount of CPU when you have many simultaneous connections and adds up to the latency. Particularly MySQL users may think PostgreSQL's connection time is much slower because these users tend to perform relatively simple queries. In my case, connection pooling and persistent connection is useless for a large server farm consisting of hundreds of partitioned and replicated servers doing only simple queries. Below is a benchmark of MySQL 3.2 and PostgreSQL 7.4 doing multiple connects/disconnects within the same server (AMD 1.2GHz, 512MB, Linux 2.4). If forking is the issue then pre-forking will give a big boost especially for simple queries: MySQL time -- 0.012786865234375 0.011546850204468 0.01167106628418 ?php $time_start = getmicrotime(); for ($i = 0; $i 20; $i++) { $DBH = mysql_connect('127.0.0.1'); mysql_select_db('test1'); mysql_close($DBH); } $Time = getmicrotime() - $time_start; ? MySQL time (with simple query) -- 0.015650987625122 0.01443886756897 0.014433860778809 ?php $time_start = getmicrotime(); for ($i = 0; $i 20; $i++) { $DBH = mysql_connect('127.0.0.1'); mysql_select_db('test1'); $Res = mysql_query('SELECT * FROM table1 WHERE id = 1', $DBH); mysql_close($DBH); } $Time = getmicrotime() - $time_start; ? PostgreSQL time --- 0.15319013595581 0.14930582046509 0.14920592308044 ?php $time_start = getmicrotime(); for ($i = 0; $i 20; $i++) { $DBH = pg_connect('dbname=test1 host=127.0.0.1'); pg_close($DBH); } $Time = getmicrotime() - $time_start; ? PostgreSQL time (with simple query) 0.19016313552856 0.18785095214844 0.18786096572876 ?php $time_start = getmicrotime(); for ($i = 0; $i 20; $i++) { $DBH = pg_connect('dbname=test1 host=127.0.0.1'); $Res = pg_query($DBH, 'SELECT * FROM table1 WHERE id = 1'); pg_close($DBH); } $Time = getmicrotime() - $time_start; ? __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] PostgreSQL pre-fork speedup
I had lots of trouble posting so you may receive this more than once. My apologies.. -- Hi, I know the issue of pre-fork PostgreSQL has been discussed previously. Someone mentionned pre-fork can be implemented when schemas become available in PostgreSQL because there will be less of the need to run multiple databases. I think Oracle 7 uses pre-forking and it helps speed up the startup time considerably. Often, there are cases where connection pooling or persistent connection cannot be used efficiently (e.g. replicated or splitted databases over hundreds of machines or where persistent connection opens up too many idle connections). Instead, there's a big need to create a new connection on every query and with PostgreSQL needing to fork on every incoming connection can be quite slow. This could be a big win since even a moderate improvement at the connection level will affect almost every user. Any chance of that happening for 7.5? Thanks. __ Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs http://hotjobs.sweepstakes.yahoo.com/careermakeover ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])