Re: SQL statement to find and delete double entries
Hello, i am definitely sure that the statement below would mess up your data. The rowid is an internal (physical) access path to your data and it is fatal to use it as a logical sorting criteria. To delete ALL duplicates: see my suggestion below. To leave the first element in your table, you will at first have to define, which one actually IS the first duplicate in your table. For example you may only consider your key field 'id': REM get/check the 'first' of your duplicates (key-rowid-pairs): select id, min(rowid), count(*) from table group by id having count(*) 1; REM delete duplikates, ignoring the 'first' REM rememer: i did not actually check this code; but i think it is OK; youn may let me know... delete from table where id in ( select id from table group by id having count(*) 1 ) and (id, rowid) ( select id, min(rowid) from table group by id having count(*) 1 ); On the other hand you my define the 'first' as a pair of two (or more) fields. I did not check this... But in every case you will loose the information stored in the extra fields of your table. I would rather suggest to seriously review your data model ... cu, Christian - Original Message - From: [EMAIL PROTECTED] To: Tim Bunce [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, September 22, 2003 6:36 PM Subject: Re: SQL statement to find and delete double entries Thanks, Tim. Adding Oracle to your search yielded the following quickly. delete from T t1 where t1.rowid ( select min(t2.rowID) from T t2 where t1.col1 = t2.col1 and t1.col2 = t2.col2); I ought to know better and just go googly early. Jeff Seger Fairchild Semiconductor [EMAIL PROTECTED] Tim Bunce [EMAIL PROTECTED] 09/19/2003 03:09 PM To: Jeffrey Seger/Corporate/[EMAIL PROTECTED] cc: Christian Merz [EMAIL PROTECTED], [EMAIL PROTECTED], Morrison, Trevor (Trevor) [EMAIL PROTECTED] Subject:Re: SQL statement to find and delete double entries It's a common problem. You can start here: http://www.google.com/search?as_q=sql+delete+duplicate and add the name of the database your using. Tim. On Fri, Sep 19, 2003 at 01:31:20PM -0400, [EMAIL PROTECTED] wrote: The only problem with that approach is that it deletes all of the entries and doesn't leave singles behind. I'd probably do it programatically. Grab the results of query 1, store the data in a hash of hashes, then do the delete and re-insert. But I'd love to hear an SQL solution to leaving one copy of each duplicate behind. Jeff Seger Fairchild Semiconductor [EMAIL PROTECTED] Christian Merz [EMAIL PROTECTED] 09/18/2003 08:33 AM To: Morrison, Trevor (Trevor) [EMAIL PROTECTED], [EMAIL PROTECTED] cc: Subject:Re: SQL statement to find and delete double entries Hi, the basic idea to find duplicate or multiple values is: select id, count(*) from table group by id having count(*) 1; to delete ALL such values you may do this: delete from table where id in ( select id from table group by id having count(*) 1 ); cu, Christian - Original Message - From: Morrison, Trevor (Trevor) [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, August 16, 2003 6:39 PM Subject: SQL statement to find and delete double entries Hi, What would be an SQL statement that will find duplicate order numbers in table and then delete them? TIA Trevor
Re: SQL statement to find and delete double entries
Christian: I'm sure you are right. There is no proper join criteria in the statement I found on google. Anyway, this has wandered far enough off topic for something that is not an urgent need for me right now. Thanks. Jeff Seger Fairchild Semiconductor [EMAIL PROTECTED] Christian Merz [EMAIL PROTECTED] 09/23/2003 09:49 AM To: Jeffrey Seger/Corporate/[EMAIL PROTECTED] cc: [EMAIL PROTECTED], [EMAIL PROTECTED] Subject:Re: SQL statement to find and delete double entries Hello, i am definitely sure that the statement below would mess up your data. The rowid is an internal (physical) access path to your data and it is fatal to use it as a logical sorting criteria. To delete ALL duplicates: see my suggestion below. To leave the first element in your table, you will at first have to define, which one actually IS the first duplicate in your table. For example you may only consider your key field 'id': REM get/check the 'first' of your duplicates (key-rowid-pairs): select id, min(rowid), count(*) from table group by id having count(*) 1; REM delete duplikates, ignoring the 'first' REM rememer: i did not actually check this code; but i think it is OK; youn may let me know... delete from table where id in ( select id from table group by id having count(*) 1 ) and (id, rowid) ( select id, min(rowid) from table group by id having count(*) 1 ); On the other hand you my define the 'first' as a pair of two (or more) fields. I did not check this... But in every case you will loose the information stored in the extra fields of your table. I would rather suggest to seriously review your data model ... cu, Christian - Original Message - From: [EMAIL PROTECTED] To: Tim Bunce [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, September 22, 2003 6:36 PM Subject: Re: SQL statement to find and delete double entries Thanks, Tim. Adding Oracle to your search yielded the following quickly. delete from T t1 where t1.rowid ( select min(t2.rowID) from T t2 where t1.col1 = t2.col1 and t1.col2 = t2.col2); I ought to know better and just go googly early. Jeff Seger Fairchild Semiconductor [EMAIL PROTECTED] Tim Bunce [EMAIL PROTECTED] 09/19/2003 03:09 PM To: Jeffrey Seger/Corporate/[EMAIL PROTECTED] cc: Christian Merz [EMAIL PROTECTED], [EMAIL PROTECTED], Morrison, Trevor (Trevor) [EMAIL PROTECTED] Subject:Re: SQL statement to find and delete double entries It's a common problem. You can start here: http://www.google.com/search?as_q=sql+delete+duplicate and add the name of the database your using. Tim. On Fri, Sep 19, 2003 at 01:31:20PM -0400, [EMAIL PROTECTED] wrote: The only problem with that approach is that it deletes all of the entries and doesn't leave singles behind. I'd probably do it programatically. Grab the results of query 1, store the data in a hash of hashes, then do the delete and re-insert. But I'd love to hear an SQL solution to leaving one copy of each duplicate behind. Jeff Seger Fairchild Semiconductor [EMAIL PROTECTED] Christian Merz [EMAIL PROTECTED] 09/18/2003 08:33 AM To: Morrison, Trevor (Trevor) [EMAIL PROTECTED], [EMAIL PROTECTED] cc: Subject:Re: SQL statement to find and delete double entries Hi, the basic idea to find duplicate or multiple values is: select id, count(*) from table group by id having count(*) 1; to delete ALL such values you may do this: delete from table where id in ( select id from table group by id having count(*) 1 ); cu, Christian - Original Message - From: Morrison, Trevor (Trevor) [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, August 16, 2003 6:39 PM Subject: SQL statement to find and delete double entries Hi, What would be an SQL statement that will find duplicate order numbers in table and then delete them? TIA Trevor
Re: SQL statement to find and delete double entries
How about if you put a proper constraint on the table. Then it really doesn't matter how elegant you are in cleaning up your data, since the it should only happen once. -Ian Christian Merz wrote: Hello, i am definitely sure that the statement below would mess up your data. The rowid is an internal (physical) access path to your data and it is fatal to use it as a logical sorting criteria. To delete ALL duplicates: see my suggestion below. To leave the first element in your table, you will at first have to define, which one actually IS the first duplicate in your table. For example you may only consider your key field 'id': REM get/check the 'first' of your duplicates (key-rowid-pairs): select id, min(rowid), count(*) from table group by id having count(*) 1; REM delete duplikates, ignoring the 'first' REM rememer: i did not actually check this code; but i think it is OK; youn may let me know... delete from table where id in ( select id from table group by id having count(*) 1 ) and (id, rowid) ( select id, min(rowid) from table group by id having count(*) 1 ); On the other hand you my define the 'first' as a pair of two (or more) fields. I did not check this... But in every case you will loose the information stored in the extra fields of your table. I would rather suggest to seriously review your data model ... cu, Christian - Original Message - From: [EMAIL PROTECTED] To: Tim Bunce [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, September 22, 2003 6:36 PM Subject: Re: SQL statement to find and delete double entries Thanks, Tim. Adding Oracle to your search yielded the following quickly. delete from T t1 where t1.rowid ( select min(t2.rowID) from T t2 where t1.col1 = t2.col1 and t1.col2 = t2.col2); I ought to know better and just go googly early. Jeff Seger Fairchild Semiconductor [EMAIL PROTECTED] Tim Bunce [EMAIL PROTECTED] 09/19/2003 03:09 PM To: Jeffrey Seger/Corporate/[EMAIL PROTECTED] cc: Christian Merz [EMAIL PROTECTED], [EMAIL PROTECTED], Morrison, Trevor (Trevor) [EMAIL PROTECTED] Subject:Re: SQL statement to find and delete double entries It's a common problem. You can start here: http://www.google.com/search?as_q=sql+delete+duplicate and add the name of the database your using. Tim. On Fri, Sep 19, 2003 at 01:31:20PM -0400, [EMAIL PROTECTED] wrote: The only problem with that approach is that it deletes all of the entries and doesn't leave singles behind. I'd probably do it programatically. Grab the results of query 1, store the data in a hash of hashes, then do the delete and re-insert. But I'd love to hear an SQL solution to leaving one copy of each duplicate behind. Jeff Seger Fairchild Semiconductor [EMAIL PROTECTED] Christian Merz [EMAIL PROTECTED] 09/18/2003 08:33 AM To: Morrison, Trevor (Trevor) [EMAIL PROTECTED], [EMAIL PROTECTED] cc: Subject:Re: SQL statement to find and delete double entries Hi, the basic idea to find duplicate or multiple values is: select id, count(*) from table group by id having count(*) 1; to delete ALL such values you may do this: delete from table where id in ( select id from table group by id having count(*) 1 ); cu, Christian - Original Message - From: Morrison, Trevor (Trevor) [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, August 16, 2003 6:39 PM Subject: SQL statement to find and delete double entries Hi, What would be an SQL statement that will find duplicate order numbers in table and then delete them? TIA Trevor
Re: SQL statement to find and delete double entries
Jeff, Here is the syntax for doing this in Sybase as a stored procedure. Assumption: id is an int. CREATE PROCEDURE del_dupes AS BEGIN DECLARE @id int, @id_count int, @delete_rows int DECLARE read_keys CURSOR FOR SELECT id, count(*) FROM table GROUP BY id HAVING count(*) 1 OPEN read_keys FETCH read_keys INTO @id, @id_count WHILE( @@sqlstatus = 0 ) BEGIN SELECT @delete_rows = @id_count - 1 SET ROWCOUNT @delete_rows DELETE table WHERE id = @id FETCH read_keys INTO @id, @id_count END END GO Stored procedures are the fastest way to do it. As the data never leaves the server. Check with your RDBMS on correct syntax for your particular db. The same steps could be done in perl, but the data for the select is transmitted to the client and then each delete statement is passed, compiled, optimized and then executed. HTH, Chuck Fox Principal DBA America Online, INC [EMAIL PROTECTED] wrote: The only problem with that approach is that it deletes all of the entries and doesn't leave singles behind. I'd probably do it programatically. Grab the results of query 1, store the data in a hash of hashes, then do the delete and re-insert. But I'd love to hear an SQL solution to leaving one copy of each duplicate behind. Jeff Seger Fairchild Semiconductor [EMAIL PROTECTED] Christian Merz [EMAIL PROTECTED] 09/18/2003 08:33 AM To: Morrison, Trevor (Trevor) [EMAIL PROTECTED], [EMAIL PROTECTED] cc: Subject:Re: SQL statement to find and delete double entries Hi, the basic idea to find duplicate or multiple values is: select id, count(*) from table group by id having count(*) 1; to delete ALL such values you may do this: delete from table where id in ( select id from table group by id having count(*) 1 ); cu, Christian - Original Message - From: Morrison, Trevor (Trevor) [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, August 16, 2003 6:39 PM Subject: SQL statement to find and delete double entries Hi, What would be an SQL statement that will find duplicate order numbers in table and then delete them? TIA Trevor
RE: SQL statement to find and delete double entries
You don't mention your RDBMS. But some RDBMS (and I presume all internally) include a unique row identifier which you can access (e.g Ingres - tid tuple-id) If this is available, then the rows are no longer 'unique' and you can decide which row you want to get rid of e.g max/min(tid) group by my uniqueness having count(*) 1 in a one-off SQL statement, thereby avoiding front-end/back-end and programming solutions. I suggest you check your RDBMS DBA/SQL guides. -Original Message- From: Chuck Fox [mailto:[EMAIL PROTECTED] Sent: 22 September 2003 15:45 To: [EMAIL PROTECTED] Cc: Christian Merz; [EMAIL PROTECTED]; Morrison, Trevor (Trevor) Subject: Re: SQL statement to find and delete double entries Jeff, Here is the syntax for doing this in Sybase as a stored procedure. Assumption: id is an int. CREATE PROCEDURE del_dupes AS BEGIN DECLARE @id int, @id_count int, @delete_rows int DECLARE read_keys CURSOR FOR SELECT id, count(*) FROM table GROUP BY id HAVING count(*) 1 OPEN read_keys FETCH read_keys INTO @id, @id_count WHILE( @@sqlstatus = 0 ) BEGIN SELECT @delete_rows = @id_count - 1 SET ROWCOUNT @delete_rows DELETE table WHERE id = @id FETCH read_keys INTO @id, @id_count END END GO Stored procedures are the fastest way to do it. As the data never leaves the server. Check with your RDBMS on correct syntax for your particular db. The same steps could be done in perl, but the data for the select is transmitted to the client and then each delete statement is passed, compiled, optimized and then executed. HTH, Chuck Fox Principal DBA America Online, INC [EMAIL PROTECTED] wrote: The only problem with that approach is that it deletes all of the entries and doesn't leave singles behind. I'd probably do it programatically. Grab the results of query 1, store the data in a hash of hashes, then do the delete and re-insert. But I'd love to hear an SQL solution to leaving one copy of each duplicate behind. Jeff Seger Fairchild Semiconductor [EMAIL PROTECTED] Christian Merz [EMAIL PROTECTED] 09/18/2003 08:33 AM To: Morrison, Trevor (Trevor) [EMAIL PROTECTED], [EMAIL PROTECTED] cc: Subject:Re: SQL statement to find and delete double entries Hi, the basic idea to find duplicate or multiple values is: select id, count(*) from table group by id having count(*) 1; to delete ALL such values you may do this: delete from table where id in ( select id from table group by id having count(*) 1 ); cu, Christian - Original Message - From: Morrison, Trevor (Trevor) [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, August 16, 2003 6:39 PM Subject: SQL statement to find and delete double entries Hi, What would be an SQL statement that will find duplicate order numbers in table and then delete them? TIA Trevor Internet communications are not secure and therefore the Barclays Group does not accept legal responsibility for the contents of this message. Although the Barclays Group operates anti-virus programmes, it does not accept responsibility for any damage whatsoever that is caused by viruses being passed. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Barclays Group. Replies to this email may be monitored by the Barclays Group for operational or business reasons.
RE: SQL statement to find and delete double entries
If you are using Sybase, you can delete duplicate rows from a table by creating a unique clustered index 'with ignore_dup_row' (and then dropping it again if you don't want to ignore later duplicate inserts). I haven't benchmarked but I think this will be somewhat faster than a stored procedure. Anyway, this gets offtopic for the DBI list - if you want to write some SQL at the client for it, ask an SQL mailing list, and if you want to get the best performance on a particular RDBMS, ask a list for that system. -- Ed Avis [EMAIL PROTECTED]
Re: SQL statement to find and delete double entries
Avis, The use of a unique clustered index is an interesting solution, however, it requires 1.2 times the space currently consumed by the table and rewrites the table. IMHO, unless we are talking a major duplication i.e. 90% dupes, creating the clustered index will be slower. Chuck [EMAIL PROTECTED] wrote: If you are using Sybase, you can delete duplicate rows from a table by creating a unique clustered index 'with ignore_dup_row' (and then dropping it again if you don't want to ignore later duplicate inserts). I haven't benchmarked but I think this will be somewhat faster than a stored procedure. Anyway, this gets offtopic for the DBI list - if you want to write some SQL at the client for it, ask an SQL mailing list, and if you want to get the best performance on a particular RDBMS, ask a list for that system.
Re: SQL statement to find and delete double entries
Thanks, Tim. Adding Oracle to your search yielded the following quickly. delete from T t1 where t1.rowid ( select min(t2.rowID) from T t2 where t1.col1 = t2.col1 and t1.col2 = t2.col2); I ought to know better and just go googly early. Jeff Seger Fairchild Semiconductor [EMAIL PROTECTED] Tim Bunce [EMAIL PROTECTED] 09/19/2003 03:09 PM To: Jeffrey Seger/Corporate/[EMAIL PROTECTED] cc: Christian Merz [EMAIL PROTECTED], [EMAIL PROTECTED], Morrison, Trevor (Trevor) [EMAIL PROTECTED] Subject:Re: SQL statement to find and delete double entries It's a common problem. You can start here: http://www.google.com/search?as_q=sql+delete+duplicate and add the name of the database your using. Tim. On Fri, Sep 19, 2003 at 01:31:20PM -0400, [EMAIL PROTECTED] wrote: The only problem with that approach is that it deletes all of the entries and doesn't leave singles behind. I'd probably do it programatically. Grab the results of query 1, store the data in a hash of hashes, then do the delete and re-insert. But I'd love to hear an SQL solution to leaving one copy of each duplicate behind. Jeff Seger Fairchild Semiconductor [EMAIL PROTECTED] Christian Merz [EMAIL PROTECTED] 09/18/2003 08:33 AM To: Morrison, Trevor (Trevor) [EMAIL PROTECTED], [EMAIL PROTECTED] cc: Subject:Re: SQL statement to find and delete double entries Hi, the basic idea to find duplicate or multiple values is: select id, count(*) from table group by id having count(*) 1; to delete ALL such values you may do this: delete from table where id in ( select id from table group by id having count(*) 1 ); cu, Christian - Original Message - From: Morrison, Trevor (Trevor) [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, August 16, 2003 6:39 PM Subject: SQL statement to find and delete double entries Hi, What would be an SQL statement that will find duplicate order numbers in table and then delete them? TIA Trevor
Re: SQL statement to find and delete double entries
The only problem with that approach is that it deletes all of the entries and doesn't leave singles behind. I'd probably do it programatically. Grab the results of query 1, store the data in a hash of hashes, then do the delete and re-insert. But I'd love to hear an SQL solution to leaving one copy of each duplicate behind. Jeff Seger Fairchild Semiconductor [EMAIL PROTECTED] Christian Merz [EMAIL PROTECTED] 09/18/2003 08:33 AM To: Morrison, Trevor (Trevor) [EMAIL PROTECTED], [EMAIL PROTECTED] cc: Subject:Re: SQL statement to find and delete double entries Hi, the basic idea to find duplicate or multiple values is: select id, count(*) from table group by id having count(*) 1; to delete ALL such values you may do this: delete from table where id in ( select id from table group by id having count(*) 1 ); cu, Christian - Original Message - From: Morrison, Trevor (Trevor) [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, August 16, 2003 6:39 PM Subject: SQL statement to find and delete double entries Hi, What would be an SQL statement that will find duplicate order numbers in table and then delete them? TIA Trevor
Re: SQL statement to find and delete double entries
It's a common problem. You can start here: http://www.google.com/search?as_q=sql+delete+duplicate and add the name of the database your using. Tim. On Fri, Sep 19, 2003 at 01:31:20PM -0400, [EMAIL PROTECTED] wrote: The only problem with that approach is that it deletes all of the entries and doesn't leave singles behind. I'd probably do it programatically. Grab the results of query 1, store the data in a hash of hashes, then do the delete and re-insert. But I'd love to hear an SQL solution to leaving one copy of each duplicate behind. Jeff Seger Fairchild Semiconductor [EMAIL PROTECTED] Christian Merz [EMAIL PROTECTED] 09/18/2003 08:33 AM To: Morrison, Trevor (Trevor) [EMAIL PROTECTED], [EMAIL PROTECTED] cc: Subject:Re: SQL statement to find and delete double entries Hi, the basic idea to find duplicate or multiple values is: select id, count(*) from table group by id having count(*) 1; to delete ALL such values you may do this: delete from table where id in ( select id from table group by id having count(*) 1 ); cu, Christian - Original Message - From: Morrison, Trevor (Trevor) [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, August 16, 2003 6:39 PM Subject: SQL statement to find and delete double entries Hi, What would be an SQL statement that will find duplicate order numbers in table and then delete them? TIA Trevor
Re: SQL statement to find and delete double entries
Hi, the basic idea to find duplicate or multiple values is: select id, count(*) from table group by id having count(*) 1; to delete ALL such values you may do this: delete from table where id in ( select id from table group by id having count(*) 1 ); cu, Christian - Original Message - From: Morrison, Trevor (Trevor) [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, August 16, 2003 6:39 PM Subject: SQL statement to find and delete double entries Hi, What would be an SQL statement that will find duplicate order numbers in table and then delete them? TIA Trevor
SQL statement to find and delete double entries
Hi, What would be an SQL statement that will find duplicate order numbers in table and then delete them? TIA Trevor
Re: SQL statement to find and delete double entries
--- Morrison, Trevor (Trevor) [EMAIL PROTECTED] wrote: What would be an SQL statement that will find duplicate order numbers in table and then delete them? Try a google statement, like sql delete duplicate rows __ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com