Re: [GENERAL] delete vs insert vs update due to primary key dups -> which is better

Richard Huxton Tue, 28 Aug 2007 01:14:28 -0700

Ow Mun Heng wrote:

On Tue, 2007-08-28 at 08:19 +0100, Richard Huxton wrote:

Ow Mun Heng wrote:
Continuining with my efforts to get similar functionality as mysql's
mysqlimport --replace I want to ask for the list's opinion on which is
better
I would suggest #3
3.\copy ; update ; insert
Don't forget to run a DELETE phase too, for rows that are no longer inthe import but are in your database.


NO Delete will _ever_ be done (hopefully if there's enough disk space)
for archival purposes and tracking etc.

Theoretically, all the imported rows are either new rows or updated

rows.


OK, so it's more a history table.

btw, pkey was actually a multiple column composite key which I've
concatenated together as varchar :
to make it simpler for the insertion/update/deletion
What's gets complicated with a multi-column key?
less typing per insert/update statement so it'll bewhere a.pkey = b.pkey instead of a.key1 = b.key1 and a.key2 = b.key2
and ... up to key5


I'd still leave it alone, but it's your database.

Option 3 seems to be better because it will leave less dead tuples due
to MVCC
It's also better because it's what you actually want to do (in a logicalsense, you want to update existing rows). That means if you ever attachtriggers to the table they will behave as you expect.
Actually, what I want is mysqlimport --replace functionality. Which is
just replace the whole row w/ new data. This DB will never have to deal
w/ updates to specific rows/columns.

No triggering is in the plan. if there will be triggering, it will be to
denormalise the table once data import is finished. This, if it comes
down to it, will be triggered via perl-DBI per import cycle.

which is why I like option 4
If you're trying to keep two databases in sync then you need a processto manage that. Don't try and make your import handle it - deal with itproperly.
Nope.. Not trying to do that. No more anyway.. New DB will be just a
subset of the main_db's Data. Whatever that is needed by the engineers
and nothing more. (new columns = new requests for "new" columns to be
added)

I still think you'll need a system to handle this. What do you do whenyou "add" a column but the rows are deleted from the mssql server?You'll want some human-level rules at least.

Option 4 is simpler, delete all the duplicate p_keys in the
main_table before inserting it with the newest data (temp_table will
_always_ have the newest unique data for that table)


Besides being the illogical way of doing things. What else is wrong with

option 4?


It'll do the job just fine.

On average, daily will have between 5K to 250K of unique rows
Yes, you should make sure you are vacuuming at least once per importcycle. However, as long as you have enough space in your free-space-map,any dead-tuple-space will be re-used.
an import cycle will pull between 100 - 1000 rows. I don't think a
vaccum per import cycle is needed. Do I?

Well, on a recent version I'd hope autovacuum is handling it for you.Basically there's no point in vacuuming more than once per import(assuming you don't have any more updates). If you vacuum once per daythen you'll need to allow space for 5000+ rows.

The other option would be #5:
1. Copy whole table from MSSQL to PG, but under new name (t1_tmp)
2. Rename t1 => t1_old, t1_tmp => t1
3. Drop t1_old
If you don't have any foreign keys referring to the table, or viewsdepending on it then that is probably simplest, although least efficient.
Actually, can't do that. Main DB data is being deleted everyday. I want
to archive the old data (to be used for analysis etc) as well.


OK - I think I see what you're doing now.

Regarding the "new" columns - I'd be tempted to copy all the columnsover and perhaps hide them with a view at the PG end of things.


--
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Re: [GENERAL] delete vs insert vs update due to primary key dups -> which is better

Reply via email to