Dan Harris wrote:
I am working on a process that will be inserting tens of million rows and need this to be as quick as possible.

The catch is that for each row I could potentially insert, I need to look and see if the relationship is already there to prevent multiple entries. Currently I am doing a SELECT before doing the INSERT, but I recognize the speed penalty in doing to operations. I wonder if there is some way I can say "insert this record, only if it doesn't exist already". To see if it exists, I would need to compare 3 fields instead of just enforcing a primary key.

Even if this could be a small increase per record, even a few percent faster compounded over the whole load could be a significant reduction.

Thanks for any ideas you might have.

-Dan


You could insert all of your data into a temporary table, and then do:

INSERT INTO final_table SELECT * FROM temp_table WHERE NOT EXISTS (SELECT info FROM final_table WHERE id=id, path=path, y=y);

Or you could load it into the temporary table, and then:
DELETE FROM temp_table WHERE EXISTS (SELECT FROM final_table WHERE id...);

And then do a plain INSERT INTO.

I can't say what the specific performance increases would be, but temp_table could certainly be an actual TEMP table (meaning it only exists during the connection), and you could easily do a COPY into that table to load it up quickly, without having to check any constraints.

Just a thought,
John
=:->

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to