[GENERAL] Multiple COPY statements

2012-05-10 Thread Lee Hachadoorian
Does anyone have experience or advice on how to efficiently issue a
large number of COPY statements? The data (US Census) comes in  100
segments (each will be copied to its own database tables) for each
state (51), for a total of  5000 text files. I can generate the COPY
statements with a script.

The two specific question I can think of (but I'm sure there's more
that I'm not thinking of) are:

1) COPY is fastest when used within the same transaction as an
earlier CREATE TABLE or TRUNCATE command. In such cases no WAL needs
to be written, because in case of an error, the files containing the
newly loaded data will be removed anyway. Would I be able to take
advantage of this if I:

BEGIN;
TRUNCATE import_table;
COPY import_table FROM 'file1';
COPY import_table FROM 'file2';
...
COPY import_table FROM 'file51';
END;

2) Is there a performance hit to doing a COPY to more than one table
in the same transaction?

Any other advice will be appreciated.

Regards,
--Lee

-- 
Lee Hachadoorian
PhD, Earth  Environmental Sciences (Geography)
Research Associate, CUNY Center for Urban Research
http://freecity.commons.gc.cuny.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Multiple COPY statements

2012-05-10 Thread Andy Colson

On 5/10/2012 1:10 PM, Lee Hachadoorian wrote:

Does anyone have experience or advice on how to efficiently issue a
large number of COPY statements? The data (US Census) comes in  100
segments (each will be copied to its own database tables) for each
state (51), for a total of  5000 text files. I can generate the COPY
statements with a script.

The two specific question I can think of (but I'm sure there's more
that I'm not thinking of) are:

1) COPY is fastest when used within the same transaction as an
earlier CREATE TABLE or TRUNCATE command. In such cases no WAL needs
to be written, because in case of an error, the files containing the
newly loaded data will be removed anyway. Would I be able to take
advantage of this if I:

BEGIN;
TRUNCATE import_table;
COPY import_table FROM 'file1';
COPY import_table FROM 'file2';
...
COPY import_table FROM 'file51';
END;


Yes, I believe so.



2) Is there a performance hit to doing a COPY to more than one table
in the same transaction?


No, I don't think so.  I assume you are the only user hitting the 
import_table, so holding one big transaction wont hurt anything.




Any other advice will be appreciated.


To really speed it up, you'd need to run multiple concurrent connections 
each doing COPY's.  Maybe up to the number of cores you have.  (of 
course you dont want each connection to fire off truncates, but 
concurrent should trump skip wall in terms of speed).


If import_table is just a temp holding stot you can look into temp 
and/or unlogged tables.



-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Multiple COPY statements

2012-05-10 Thread Lee Hachadoorian
On Thu, May 10, 2012 at 2:42 PM, Andy Colson a...@squeakycode.net wrote:
 On 5/10/2012 1:10 PM, Lee Hachadoorian wrote:

 2) Is there a performance hit to doing a COPY to more than one table
 in the same transaction?


 No, I don't think so.  I assume you are the only user hitting the
 import_table, so holding one big transaction wont hurt anything.

Actually what I mean is that there are multiple import tables,
import_table1 ... import_table100. But it is true that I would be the
only user hitting the import tables.

 Any other advice will be appreciated.


 To really speed it up, you'd need to run multiple concurrent connections
 each doing COPY's.  Maybe up to the number of cores you have.  (of course
 you dont want each connection to fire off truncates, but concurrent should
 trump skip wall in terms of speed).

 If import_table is just a temp holding stot you can look into temp and/or
 unlogged tables.

Yes, it is a staging table, data needs to be manipulated before
shunting to its desired destination. I think unlogged tables will be
helpful, and if I understand correctly then I wouldn't need to use the
BEGIN; TRUNCATE; COPY...; END; trick. And would unlogged + concurrent
connections work together?

--Lee

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Multiple COPY statements

2012-05-10 Thread Andy Colson

On 5/10/2012 2:00 PM, Lee Hachadoorian wrote:

On Thu, May 10, 2012 at 2:42 PM, Andy Colsona...@squeakycode.net  wrote:

On 5/10/2012 1:10 PM, Lee Hachadoorian wrote:


2) Is there a performance hit to doing a COPY to more than one table
in the same transaction?



No, I don't think so.  I assume you are the only user hitting the
import_table, so holding one big transaction wont hurt anything.


Actually what I mean is that there are multiple import tables,
import_table1 ... import_table100. But it is true that I would be the
only user hitting the import tables.


Any other advice will be appreciated.



To really speed it up, you'd need to run multiple concurrent connections
each doing COPY's.  Maybe up to the number of cores you have.  (of course
you dont want each connection to fire off truncates, but concurrent should
trump skip wall in terms of speed).

If import_table is just a temp holding stot you can look into temp and/or
unlogged tables.


Yes, it is a staging table, data needs to be manipulated before
shunting to its desired destination. I think unlogged tables will be
helpful, and if I understand correctly then I wouldn't need to use the
BEGIN; TRUNCATE; COPY...; END; trick. And would unlogged + concurrent
connections work together?

--Lee




Oh yes.  concurrent + unlogged would be the best of all worlds.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general