Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

Pierre C Sun, 06 Jun 2010 15:42:23 -0700

Since you have lots of data you can use parallel loading.


Split your data in several files and then do :

CREATE TEMPORARY TABLE loader1 ( ... )
COPY loader1 FROM ...

Use a TEMPORARY TABLE for this : you don't need crash-recovery since ifsomething blows up, you can COPY it again... and it will be much fasterbecause no WAL will be written.

If your disk is fast, COPY is cpu-bound, so if you can do 1 COPY processper core, and avoid writing WAL, it will scale.

This doesn't solve the other half of your problem (removing theduplicates) which isn't easy to parallelize, but it will make the COPYpart a lot faster.

Note that you can have 1 core process the INSERT / removing duplicateswhile the others are handling COPY and filling temp tables, so if youpipeline it, you could save some time.

Does your data contain a lot of duplicates, or are they rare ? Whatpercentage ?


--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

Reply via email to