On 11 Sep 2010, at 6:10, Scott Bailey wrote: > On 09/10/2010 08:07 PM, t...@exquisiteimages.com wrote: >> I have a situation where I receive a file with transactions that have a >> unique key from a vendor. These transactions should only be imported into >> my system once, but the vendor system will occasionally resend a >> transaction by mistake. >> >> The way I am currently handling this with Micorosft ADO and FoxPro files >> is to open a table with an index on the vendor key and seek on the key. If >> there is no match I add it, if there is a match I put it in an exception >> file to be manually checked. >> >> Using PostgreSQL I can't open a table and do seeks against an index. I >> could do a select against the database and see if 0 records are returned, >> but that seems to take more time than doing a seek on an index. Is there a >> more SQL friendly way of handling this task? > > Postgres isn't going to just use the index because it needs the visibility > information in the table. But it will be better to load all of the data into > a staging table using COPY and then insert the missing rows from there. It > will be a ton faster than going a row at a time, looking for a match then > doing an insert.
This is probably the best way to go about this. Basically you perform: BEGIN; -- read in data-file COPY staging_table FROM STDIN; Your data here \. -- delete duplicates DELETE FROM staging_table USING live_table WHERE live_table.key = staging_table.key; -- insert remaining data (non-duplicates) INSERT INTO live_table (key, data1, data2, etc) SELECT key, data1, data2, etc FROM staging_table; COMMIT; You could add a step before deleting duplicates that would: INSERT INTO duplicate_table (key, data1, data2, etc) SELECT key, data1, data2, etc FROM staging_table WHERE EXISTS (SELECT 1 FROM live_table WHERE key = staging_table.key); The following are a few would-be-nice-to-have's that AFAIK aren't possible yet. Often the reason we don't have these is the SQL standard, which is a pretty good reason. Still... It would be great to be able to use a WITH statement to lock down a data set for multiple subsequent operations, something like: WITH nonduplicates (key, data1, data2, etc) AS ( SELECT key, data1, data2, etc FROM staging_table EXCEPT SELECT key, data1, data2, etc FROM live_table ) INSERT INTO live_table (key, data1, data2, etc) SELECT key, data1, data2, etc FROM nonduplicates RETURNING key, data1, data2, etc UNION ALL DELETE FROM staging_table USING nonduplicates WHERE key = nonduplicates.key RETURNING key, data1, data2, etc; Or something like that. It's just an example from what I have in mind, after all ;) But of course for this particular situation it would be really ideal to be able to just do: MOVE * FROM staging_table TO live_table WHERE NOT EXISTS ( SELECT 1 FROM live_table WHERE key = staging_table.key ); Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4c8b557b10401521071037! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general