Re: [PERFORM] insert vs select into performance

PFC Wed, 18 Jul 2007 04:08:23 -0700

It's the time to parse statements, plan, execute, roundtrips withthe client, context switches, time for your client library to escapethe data and encode it and for postgres to decode it, etc. In a word :OVERHEAD.
I know there is some overhead, but that much when running it batched...?


        Well, yeah ;)

Unfortunately its not fast enough, it needs to be done in no more than1-2 seconds, ( and in production it will be maybe 20-50 columns of data,perhaps divided over 5-10 tables.)Additionally it needs to scale to perhaps three times as many columnsand perhaps 2 - 3 times as many rows in some situation within 1 seconds.Further on it needs to allow for about 20 - 50 clients reading much ofthat data before the next batch of data arrives.


        Wow. What is the application ?

Test run on a desktop PC, Athlon 64 3200+, 2 IDE disks in RAID1 (prettyslow) :


test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 11,463 ms

test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000) as a;

INSERT 0 100000
Temps : 721,579 ms

OK, so you see, insert speed is pretty fast. With a better CPU and fasterdisks, you can get a lot more.


test=> TRUNCATE TABLE test;
TRUNCATE TABLE
Temps : 30,010 ms

test=> ALTER TABLE test ADD PRIMARY KEY (f);

INFO: ALTER TABLE / ADD PRIMARY KEY créera un index implicite «test_pkey»pour la table «test»

ALTER TABLE
Temps : 100,577 ms

test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000) as a;

INSERT 0 100000
Temps : 1915,928 ms

        This includes the time to update the index.

test=> DROP TABLE test;
DROP TABLE
Temps : 28,804 ms

test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 1,626 ms

test=> CREATE OR REPLACE FUNCTION test_insert( )
    RETURNS VOID
    LANGUAGE plpgsql
    AS
$$
DECLARE
    _i INTEGER;
BEGIN
    FOR _i IN 0..100000 LOOP
        INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
    END LOOP;
END;
$$;
CREATE FUNCTION
Temps : 51,948 ms

test=> SELECT test_insert();
 test_insert
-------------

(1 ligne)

Temps : 1885,382 ms

Now you see, performing 100K individual inserts inside a plpgsql functionis also fast.The postgres engine is pretty damn fast ; it's the communication overheadthat you feel, especially switching between client and server processes.


        Another example :

=> INSERT INTO test (a,b,c,d,e,f) VALUES (... 100000 integer tuples)
INSERT 0 100000
Temps : 1836,458 ms

VALUES is actually pretty fast. Here, there is no context switch,everything is done in 1 INSERT.

However COPY is much faster because the parsing overhead and de-escapingof data is faster. COPY is optimized for throughput.


        So, advice :

For optimum throughput, have your application build chunks of data intotext files and use COPY. Or if your client lib supports the copyinterface, use it.You will need a fast disk system with xlog and data on separate disks,several CPU cores (1 insert thread will max out 1 core, use the others forselects), lots of RAM so index updates don't need to seek, and tuning ofbgwriter and checkpoints to avoid load spikes.


























---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Re: [PERFORM] insert vs select into performance

Reply via email to