Thanks for the reply Steve. These suggestions are new to me, so I'd like to 
rephrase them back to you in order to make sure I understand the bits and 
details. 



On Dec 19, 2012, at 10:48 AM, Steve Crawford <scrawf...@pinpointresearch.com> 
wrote:
>> 
> I suppose you could use a trigger to check each record before inserting but 
> that is likely to be inefficient for bulk loads. A quick bash loop is 
> probably your best bet. Something along the lines of:
> 
> for inputfile in /infiledirectory/*.csv
> do
>    cat inputfile | psql [connection-params] -c '\copy rawinput from stdin csv 
> header...'
> done

I get this… If my except for the header… bit. Here is my interpretation of your 
code including my specific connection parameters. 

#!/bin/sh
for inputfile in '/Volumes/disk7/b4warmed3/export/60min2/*.txt'
do
   cat inputfile | psql -p 54321 -h localhost -c '\copy rawinput FROM stdin 
WITH CSV HEADER DELIMTER AS ',' NULL AS 'NA '
done

I added single quotes around the path to the input files. Correct right?

> 
> This imports everything into a "staging" table (I called it rawinput). From 
> there you can create your final table with SELECT DISTINCT…

This bit needs to be as a separate step right? (rowid is the primary key)

SELECT DISTINCT ON (rowid) *
        FROM rawinput;

From here do I need another COPY FROM or some kind of INSERT statement? 

> 
> For speed make sure that you create your staging table as "unlogged".

I understand that I need to create the rawinput table first, but I am 
unfamiliar with the "UNLOGGED" option. I assume it makes things faster… Does it 
go something like:

CREATE TABLE UNLOGGED rawinput; 

Do I need to create all the variables (including types) in rawinput as well? If 
so, then I assume that I do not want rowid to have a primary key… or else I 
would be back where I started. 


> 
> Cheers,
> Steve
> 

Thanks again,

Kirk


Reply via email to