Re: [GENERAL] Good way to insert/update when you're not sure of duplicates?

Shridhar Daithankar Thu, 25 Sep 2003 23:42:27 -0700

Richard Huxton wrote:

On Thursday 25 September 2003 02:23, Curtis Stanford wrote:
I'm in a situation where I need to load an ASCII file into a database. No
sweat so far right? The records are indexed by date and I have a unique
index on date. The ASCII can overlap, meaning it can contain duplicate
dates that have been loaded before.
I started out with this ingenious idea. Just try to insert the record. If I
get an error about duplicate keys, do an update and switch to update mode.
If I'm updating and get zero rows updated, switch back to insert. Works
fine, except this whole thing is in a transaction and any errors abort the
transaction and I can't commit it. So, I decided I'd have to try to update
every single record and, if zero rows were updated, do the insert. This
takes a looooong time. Many hours for just 86000 records or so.
Is there a standard way to do this. I can't imagine I'm the only guy that
need to do this kind of thing.
Try inserting a batch of 1024. If you get an error, drop down to 512 and try that. Repeat until the batch works or you've reached a size of 1. If the batch worked, try the next set of records and repeat. If you reached a batch size of 1 with no success then, switch to updating, and repeat the cycle increasing your batch-size as you go.

You might find it quickest to halve batch-size while having problems then doubling while it works. The balance is going to depend on how many insert vs update rows you have.

In addition to that, you can try inserting from multiple backends simaltenously to speed up the whole process.

And I don't like the modes idea OP gave. I would rather follow insert->if-error-update mode in a transaction for each record. And fork over say 20/40 parallel backends to achieve good speed.

Shridhar


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: [GENERAL] Good way to insert/update when you're not sure of duplicates?

Reply via email to