Re: [GENERAL] COPY from .csv File and Remove Duplicates

David Johnston Thu, 11 Aug 2011 16:51:12 -0700

On Aug 11, 2011, at 19:13, Rich Shepard <[email protected]> wrote:


>  A table has a sequence to generate a primary key for inserted records with
> NULLs in that column.
> 
>  I have a .csv file of approximately 10k rows to copy into this table. My
> two questions which have not been answered by reference to my postgres
> reference book or Google searches are:
> 
>  1) Will the sequence automatically add the nextval() to each new record as
> the copy command runs?
> 
>  2) Many of these rows almost certainly are already in the table. I would
> like to remove duplicates either during the COPY command or immediately
> after. I'm considering copying the new data into a clone of the table then
> running a SELECT to add only those rows in the new cloned table to the
> existing table.
> 
>  Suggestions on getting these data in without duplicating existing rows
> will be very helpful.
> 

If you have duplicates with matching real keys inserting into a staging table 
and then moving new records to the final table is your best option (in general 
it is better to do a two-step with a staging table since you can readily use 
Postgresql to perform any intermediate translations)  As for the import itself, 
I believe if the column with the sequence is present in the csv the sequence 
will not be used and, if no value is present, a null will be stored for that 
column - causing any not-null constraint to throw an error.  In this case I 
would just import the data to a staging table without any kind of artificial 
key, just the true key, and then during the merge with the live table you 
simply omit the pk field from the insert statement and the sequence will kick 
in at that point.

David J.


-- 
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] COPY from .csv File and Remove Duplicates

Reply via email to