On Tue, 2005-07-26 at 10:50 -0600, Dan Harris wrote: > I am working on a process that will be inserting tens of million rows > and need this to be as quick as possible. > > The catch is that for each row I could potentially insert, I need to > look and see if the relationship is already there to prevent > multiple entries. Currently I am doing a SELECT before doing the > INSERT, but I recognize the speed penalty in doing to operations. I > wonder if there is some way I can say "insert this record, only if it > doesn't exist already". To see if it exists, I would need to compare > 3 fields instead of just enforcing a primary key. > > Even if this could be a small increase per record, even a few percent > faster compounded over the whole load could be a significant reduction. > > Thanks for any ideas you might have. >
Perhaps a trigger: CREATE FUNCTION verify_unique() RETURNS TRIGGER AS $func$ BEGIN PERFORM a,b,c FROM table1 WHERE a = NEW.a and b = NEW.b and c = NEW.c; IF FOUND THEN RETURN NULL; END IF; RETURN NEW; END; $func$ LANGUAGE plpgsql STABLE; CREATE TRIGGER verify_unique BEFORE INSERT ON table1 FOR EACH ROW EXECUTE PROCEDURE verify_unique(); Triggers are fired on COPY commands and if table1 is able to be cached and you have an index on table1(a,b,c) the results should be fairly decent. I would be interested in seeing the difference in timing between this approach and the temp table approach. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly