Re: Duplicate Records

DL Neil Tue, 12 Feb 2002 16:58:47 -0800

Dan,
[I've cc-ed this to the list and to Rich - who asked the original question]


> You can do the following to remove duplicates:
>
> - create table foo as select distinct cols from
> table_name_containing_duplicates;
> - drop table_name_containing_duplicates;
> - alter table foo rename to table_name_containing_duplicates;
>
> Done.
>
> Now, make a primay key for the new table!! -- this will avoid duplicates.


This is one of a couple of rather neat SQL methods for dealing with duplicates - which 
strictly-speaking answers
Rich's question.

What is its disadvantage? The problem with a 'delete' technique is that no real record 
is kept of the action -
which in a commercial situation makes auditors and other 'suits' unhappy. Even in a 
personal situation it can
make life difficult because "you don't know what you've got till it's gone"!

Consequently when it comes to 'deletions', us worry-worts tend to take the 
longer-winded/more boring approach,
eg doing a SELECT before a DELETE, because then one is sure of two things: 1 the 
actual data being DELETEd (and
an audit trail for those that regard such things as 'light reading'); and 2 that there 
is no 'slight error' in
the deletion criteria (or assumptions about the data) which leads to a major 'oops'.

A visual inspection row by row (as I recommended) on a table of millions of records 
would be totally
impractical - automated methods would be the only way to go!

Regards,
=dn



> > Rich,
> >
> > > How does one go about removing one of two identical records in a MySQL
> > > database?  My mistake in an earlier database was not applying a unique
> > > number to each record (1, 2, 3, ---).  I know it's possible to use the
> > > DISTINCT operator to show only one of the identical records in a result,
> > > but that does not remove one of them from the database.
> >
> > =Ouch!
> >
> > =Do you have a particular criteria to delete one or other of any
> > duplicate records, or are they absolutely
> > identical (and therefore it doesn't matter which stays/goes)?
> >
> > =You cannot really risk automated deletion in either case!
> > - if the records are slightly different, criteria must be
> > established to determine which is right/should be
> > kept;
> > - if they are absolutely identical, how will you identify in a
> > WHERE clause that only one record of that
> > criteria is to be removed of two identical rows?
> >
> > =To identify the duplicate rows try:-
> >
> > SELECT column(s), COUNT(*) as duplicates
> > FROM tbl
> > GROUP BY key
> > HAVING duplicates > 1
> >
> > =Depending upon the number of rows returned, you could then feed
> > those keys into a series of DELETE ... LIMIT 1
> > commands - assuming duplicates are strictly 'two of'; otherwise
> > it might be easier to take the listing and do it
> > by hand (using a mgmt package).
> >
> > > One thought that I had would be to add a unique number to each record,
> > > and that could probably be done manually, one record at a time, but is
> > > there a way to automate the process so that unique numbers could be
> > > assigned with one command?
> >
> > =ALTER TABLE allows the addition of a new column. The only
> > question is whether to do it before (if it might help
> > the editing job) or after, weeding out the duplicates (to get a
> > more continuous AUTO_INCREMENT sequence, if it's
> > at all of interest).
> >
> > =Regards,
> > =dn



---------------------------------------------------------------------
Before posting, please check:
   http://www.mysql.com/manual.php   (the manual)
   http://lists.mysql.com/           (the list archive)

To request this thread, e-mail <[EMAIL PROTECTED]>
To unsubscribe, e-mail <[EMAIL PROTECTED]>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Re: Duplicate Records

Reply via email to