It's the time to parse statements, plan, execute, roundtrips with
the client, context switches, time for your client library to escape
the data and encode it and for postgres to decode it, etc. In a word :
OVERHEAD.
I know there is some overhead, but that much when running it batched...?
Well, yeah ;)
Unfortunately its not fast enough, it needs to be done in no more than
1-2 seconds, ( and in production it will be maybe 20-50 columns of data,
perhaps divided over 5-10 tables.)
Additionally it needs to scale to perhaps three times as many columns
and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
Further on it needs to allow for about 20 - 50 clients reading much of
that data before the next batch of data arrives.
Wow. What is the application ?
Test run on a desktop PC, Athlon 64 3200+, 2 IDE disks in RAID1 (pretty
slow) :
test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 11,463 ms
test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000
) as a;
INSERT 0 100000
Temps : 721,579 ms
OK, so you see, insert speed is pretty fast. With a better CPU and faster
disks, you can get a lot more.
test=> TRUNCATE TABLE test;
TRUNCATE TABLE
Temps : 30,010 ms
test=> ALTER TABLE test ADD PRIMARY KEY (f);
INFO: ALTER TABLE / ADD PRIMARY KEY créera un index implicite «test_pkey»
pour la table «test»
ALTER TABLE
Temps : 100,577 ms
test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000
) as a;
INSERT 0 100000
Temps : 1915,928 ms
This includes the time to update the index.
test=> DROP TABLE test;
DROP TABLE
Temps : 28,804 ms
test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 1,626 ms
test=> CREATE OR REPLACE FUNCTION test_insert( )
RETURNS VOID
LANGUAGE plpgsql
AS
$$
DECLARE
_i INTEGER;
BEGIN
FOR _i IN 0..100000 LOOP
INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
END LOOP;
END;
$$;
CREATE FUNCTION
Temps : 51,948 ms
test=> SELECT test_insert();
test_insert
-------------
(1 ligne)
Temps : 1885,382 ms
Now you see, performing 100K individual inserts inside a plpgsql function
is also fast.
The postgres engine is pretty damn fast ; it's the communication overhead
that you feel, especially switching between client and server processes.
Another example :
=> INSERT INTO test (a,b,c,d,e,f) VALUES (... 100000 integer tuples)
INSERT 0 100000
Temps : 1836,458 ms
VALUES is actually pretty fast. Here, there is no context switch,
everything is done in 1 INSERT.
However COPY is much faster because the parsing overhead and de-escaping
of data is faster. COPY is optimized for throughput.
So, advice :
For optimum throughput, have your application build chunks of data into
text files and use COPY. Or if your client lib supports the copy
interface, use it.
You will need a fast disk system with xlog and data on separate disks,
several CPU cores (1 insert thread will max out 1 core, use the others for
selects), lots of RAM so index updates don't need to seek, and tuning of
bgwriter and checkpoints to avoid load spikes.
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq