It's the time to parse statements, plan, execute, roundtrips with the client, context switches, time for your client library to escape the data and encode it and for postgres to decode it, etc. In a word : OVERHEAD.

I know there is some overhead, but that much when running it batched...?

        Well, yeah ;)

Unfortunately its not fast enough, it needs to be done in no more than 1-2 seconds, ( and in production it will be maybe 20-50 columns of data, perhaps divided over 5-10 tables.) Additionally it needs to scale to perhaps three times as many columns and perhaps 2 - 3 times as many rows in some situation within 1 seconds. Further on it needs to allow for about 20 - 50 clients reading much of that data before the next batch of data arrives.

        Wow. What is the application ?

Test run on a desktop PC, Athlon 64 3200+, 2 IDE disks in RAID1 (pretty slow) :

test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 11,463 ms

test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000 ) as a;
INSERT 0 100000
Temps : 721,579 ms

OK, so you see, insert speed is pretty fast. With a better CPU and faster disks, you can get a lot more.

test=> TRUNCATE TABLE test;
TRUNCATE TABLE
Temps : 30,010 ms

test=> ALTER TABLE test ADD PRIMARY KEY (f);
INFO: ALTER TABLE / ADD PRIMARY KEY créera un index implicite «test_pkey» pour la table «test»
ALTER TABLE
Temps : 100,577 ms

test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000 ) as a;
INSERT 0 100000
Temps : 1915,928 ms

        This includes the time to update the index.

test=> DROP TABLE test;
DROP TABLE
Temps : 28,804 ms

test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 1,626 ms

test=> CREATE OR REPLACE FUNCTION test_insert( )
    RETURNS VOID
    LANGUAGE plpgsql
    AS
$$
DECLARE
    _i INTEGER;
BEGIN
    FOR _i IN 0..100000 LOOP
        INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
    END LOOP;
END;
$$;
CREATE FUNCTION
Temps : 51,948 ms

test=> SELECT test_insert();
 test_insert
-------------

(1 ligne)

Temps : 1885,382 ms

Now you see, performing 100K individual inserts inside a plpgsql function is also fast. The postgres engine is pretty damn fast ; it's the communication overhead that you feel, especially switching between client and server processes.

        Another example :

=> INSERT INTO test (a,b,c,d,e,f) VALUES (... 100000 integer tuples)
INSERT 0 100000
Temps : 1836,458 ms

VALUES is actually pretty fast. Here, there is no context switch, everything is done in 1 INSERT.

However COPY is much faster because the parsing overhead and de-escaping of data is faster. COPY is optimized for throughput.

        So, advice :

For optimum throughput, have your application build chunks of data into text files and use COPY. Or if your client lib supports the copy interface, use it. You will need a fast disk system with xlog and data on separate disks, several CPU cores (1 insert thread will max out 1 core, use the others for selects), lots of RAM so index updates don't need to seek, and tuning of bgwriter and checkpoints to avoid load spikes.

























---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Reply via email to