[GENERAL] inserting only new rows from csv file

2008-10-01 Thread Seb
Hi,

I've seen some approaches to this problem, but I suspect there are
better ways.

A csv file is produced nightly, with the same table structure as a table
in a PostgreSQL database.  The file grows over time as new records are
added every day.  However, the rows in the file may not be sorted the
same way every day.

I need to isolate the new rows and insert them into the database table.
Before I start writing a script to do this, I need to decide whether to
do the comparison between the database table and the csv file: a) within
the database, or b) COPY'ing the table to a file and then do the
comparison.  I'm not very experienced with SQL, so am more comfortable
with the latter option using plain shell and awk.  Which route is best?
If a), I'd appreciate some SQL code to do it.  Thanks.


Cheers,

-- 
Seb


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] inserting only new rows from csv file

2008-10-01 Thread Scott Marlowe
On Wed, Oct 1, 2008 at 3:16 PM, Seb [EMAIL PROTECTED] wrote:
 Hi,

 I've seen some approaches to this problem, but I suspect there are
 better ways.

 A csv file is produced nightly, with the same table structure as a table
 in a PostgreSQL database.  The file grows over time as new records are
 added every day.  However, the rows in the file may not be sorted the
 same way every day.

 I need to isolate the new rows and insert them into the database table.
 Before I start writing a script to do this, I need to decide whether to
 do the comparison between the database table and the csv file: a) within
 the database, or b) COPY'ing the table to a file and then do the
 comparison.  I'm not very experienced with SQL, so am more comfortable
 with the latter option using plain shell and awk.  Which route is best?
 If a), I'd appreciate some SQL code to do it.  Thanks.

I'd load them into a staging table, and use a query (either a
subselect or a left join where null type query) to load them into the
master table.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] inserting only new rows from csv file

2008-10-01 Thread Harvey, Allan AC


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of Scott Marlowe
 Sent: Thursday, 2 October 2008 7:39 AM
 To: Seb
 Cc: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] inserting only new rows from csv file
 
 
 On Wed, Oct 1, 2008 at 3:16 PM, Seb [EMAIL PROTECTED] wrote:
  Hi,
 
  I've seen some approaches to this problem, but I suspect there are
  better ways.
 
  A csv file is produced nightly, with the same table 
 structure as a table
  in a PostgreSQL database.  The file grows over time as new 
 records are
  added every day.  However, the rows in the file may not be 
 sorted the
  same way every day.
 
  I need to isolate the new rows and insert them into the 
 database table.
  Before I start writing a script to do this, I need to 
 decide whether to
  do the comparison between the database table and the csv 
 file: a) within
  the database, or b) COPY'ing the table to a file and then do the
  comparison.  I'm not very experienced with SQL, so am more 
 comfortable
  with the latter option using plain shell and awk.  Which 
 route is best?
  If a), I'd appreciate some SQL code to do it.  Thanks.
 
 I'd load them into a staging table, and use a query (either a
 subselect or a left join where null type query) to load them into the
 master table.
 
I solved the same problem this way.
csv file processed by gawk script to produce SQL script with inserts etc. Then 
pass
the SQL file to psql.

This is what I have in my BEGIN line of my gawk script leaving out appropriate 
print statements for clarity.

CREATE TEMPORARY TABLE tempjobdetails( LIKE jobdetails INCLUDING DEFAULTS );

--Do load from csv into tempjobdetails

This is in my END line of the gawk scrip ditto the print statements.

insert into jobdetails
   ( select * from tempjobdetails
 where
 ( tempjobdetails.jobnum, tempjobdetails.opnum )
 not in
 ( select
   jobdetails.jobnum, jobdetails.opnum from jobdetails
  )
);

Hope this helps.

Allan


The material contained in this email may be confidential, privileged or 
copyrighted. If you are not the intended recipient, use, disclosure or copying 
of this information is prohibited. If you have received this document in error, 
please advise the sender and delete the document. Neither OneSteel nor the 
sender accept responsibility for any viruses contained in this email or any 
attachments.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general