On Tue, 2005-07-26 at 10:50 -0600, Dan Harris wrote:
> I am working on a process that will be inserting tens of million rows  
> and need this to be as quick as possible.
> 
> The catch is that for each row I could potentially insert, I need to  
> look and see if the relationship is already there  to prevent  
> multiple entries.  Currently I am doing a SELECT before doing the  
> INSERT, but I recognize the speed penalty in doing to operations.  I  
> wonder if there is some way I can say "insert this record, only if it  
> doesn't exist already".  To see if it exists, I would need to compare  
> 3 fields instead of just enforcing a primary key.
> 
> Even if this could be a small increase per record, even a few percent  
> faster compounded over the whole load could be a significant reduction.
> 
> Thanks for any ideas you might have.
> 

Perhaps a trigger:

CREATE FUNCTION verify_unique() RETURNS TRIGGER AS $func$
BEGIN
        PERFORM a,b,c FROM table1 WHERE a = NEW.a and b = NEW.b and c = NEW.c;
        IF FOUND THEN 
                RETURN NULL;
        END IF;
        RETURN NEW;
END;
$func$ LANGUAGE plpgsql STABLE;

CREATE TRIGGER verify_unique BEFORE INSERT ON table1 FOR EACH ROW
EXECUTE PROCEDURE verify_unique();

Triggers are fired on COPY commands and if table1 is able to be cached
and you have an index on table1(a,b,c) the results should be fairly
decent. I would be interested in seeing the difference in timing between
this approach and the temp table approach.


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Reply via email to