The age old delete duplicates

Scott Haneda Thu, 05 May 2005 00:40:38 -0700

I have been researching on how to deal with duplicate data.  While I have a
case where there can be duplicate data, and I want to get rid of it, the
general ideas I seem to find are as follows, plus my own, which I would like
opinions on since I have not seen it mentioned.


I am not in a situation where I can test for duplicates on insert, so the on
duplicate key update deal does me no good here.

Most data points to selecting distinct records to a temp table, deleting the
source records, and selecting back into the original, three steps.  I
imagine there is a bit of overhead to the create temp table, it afterall, is
writing a small file to disk.

saying I want to favor newer records over old, and I have a pk of id, is
there any flaw in:
SELECT distinct(id) from table where my_param = 'foo' order by id;
that gives me the newest unique record id's
// little logic to convert result set inot a mysql list
DELETE from table where my_param = 'foo' and id NOT IN (the_result)

So I have two steps, I save the temp table creation, AND I don't bother
doing anything at all in the case where there are only uniqe records to
begin with, ie: nothing to delete.

Is there something wrong with this, is seems a wee bit more efficient that
the temp table deal for my case?

-- 
-------------------------------------------------------------
Scott Haneda                                Tel: 415.898.2602
<http://www.newgeo.com>                     Novato, CA U.S.A.



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

The age old delete duplicates

Reply via email to