I think got it in the end by doing a union and a join. delete AA, PA from ACCOUNTACTION AA, ACCOUNTPAYMENTACTION PA where AA.ID = PA.ID and AA.ID in (select D.ID from (select A1.ID from ACCOUNTACTION A1 left join ( select * from ACCOUNTACTION A2 where A2.ACTIONDATE like '2008-08-01 02:00%' group by A2.ACCOUNT_ID having count(A2.ACCOUNT_ID) > 1 union select * from ACCOUNTACTION A3 where A3.ACTIONDATE like '2008-08-01 02:00%' group by A3.ACCOUNT_ID having count(A3.ACCOUNT_ID) = 1 ) as U1 on A1.ID = U1.ID where A1.ACTIONDATE like '2008-08-01 02:00%' and U1.ID is NULL ) as D );
Thanks for the pointers ;-) -----Original Message----- From: Magnus Smith [mailto:[EMAIL PROTECTED] Sent: 07 August 2008 10:35 To: Ananda Kumar Cc: mysql@lists.mysql.com Subject: RE: removing duplicate entries Yes I can see you are correct. I tried setting up a little test case myself. CREATE TABLE ACCOUNTACTION ( ID INT NOT NULL PRIMARY KEY, ACTIONDATE DATETIME, ACCOUNT_ID INT NOT NULL ); CREATE TABLE ACCOUNTPAYMENTACTION ( ID INT NOT NULL PRIMARY KEY, AMOUNT INT ); INSERT INTO ACCOUNTACTION (ID, ACTIONDATE, ACCOUNT_ID) VALUES('001', '2008-08-01 02:00:00', '101'), ('002', '2008-08-01 02:00:00', '101'), ('003', '2008-08-01 02:00:00', '101'), ('004', '2008-08-01 02:00:00', '102'), ('005', '2008-08-01 02:00:00', '103'), ('006', '2008-08-01 02:00:00', '104'), ('007', '2008-08-01 02:00:00', '104'), ('008', '2008-08-01 02:00:00', '105'), ('009', '2008-08-01 03:00:00', '104'), ('010', '2008-08-01 03:00:00', '105'), ('011', '2008-08-01 02:00:00', '106'); INSERT INTO ACCOUNTPAYMENTACTION (ID, AMOUNT) VALUES('001', '1000'), ('002', '1000'), ('003', '1000'), ('004', '1000'), ('005', '1000'), ('006', '1000'), ('007', '1000'), ('008', '1000'), ('009', '1000'), ('010', '1000'), ('011', '1000'); I got the following query that seems to work on my test case. I create a union of everything that is not a duplicate and then take the ones that are not in this to be the duplicates select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' and (ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID) not in (select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' group by ACCOUNTACTION.ACCOUNT_ID having count(ACCOUNTACTION.ACCOUNT_ID) > 1 union select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' group by ACCOUNTACTION.ACCOUNT_ID having count(ACCOUNTACTION.ACCOUNT_ID) = 1); The problem is that when I try to apply this to my real database tables that are quite large then the query does not return. I am thinking that there must be a more efficient way to write the query? I would be pleased to hear any suggestions - thanks ________________________________ From: Ananda Kumar [mailto:[EMAIL PROTECTED] Sent: 06 August 2008 13:51 To: Magnus Smith Cc: mysql@lists.mysql.com Subject: Re: removing duplicate entries I just did a test case here select * from amc_25; +------+ | id | +------+ | 2 | | 14 | | 1 | | 2 | +------+ 4 rows in set (0.01 sec) select id from amc_25 group by id having count(id) >1 and id!=min(id); Empty set (0.00 sec) It does not give me any rows. R u sure the rows returned, are the one you want to keep are indeed duplicates On 8/6/08, Magnus Smith <[EMAIL PROTECTED]> wrote: When I try the first suggestion (i) then I get all the 1682 duplicate rows. The thing is that I need to keep the originals which are the ones with the lowest ACCOUNTACTION.ID <http://accountaction.id/> value. The second suggestion (ii) gives me 563 rows that are the duplicates with the lowest ACCOUNTACTION.ID <http://accountaction.id/> which are the ones I wish to keep So the ones I want to delete are the ones in (i) and not (ii) When I use select ACCOUNTACTION.ID <http://accountaction.id/> from ACCOUNTACTION where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' group by ACCOUNTACTION.ACCOUNT_ID having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and ACCOUNTACTION.ID <http://accountaction.id/> != min(ACCOUNTACTION.ID <http://accountaction.id/> )); then I get 1119 rows which is all the duplicates in (i) less the originals in (ii) The problem I'm having is using this in a delete statement. ________________________________ From: Ananda Kumar [mailto:[EMAIL PROTECTED] Sent: 06 August 2008 10:11 To: Magnus Smith Cc: mysql@lists.mysql.com Subject: Re: removing duplicate entries I doubt the belwo sql will give you duplcates select ACCOUNTACTION.ID <http://accountaction.id/> from ACCOUNTACTION where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' group by ACCOUNTACTION.ACCOUNT_ID having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and ACCOUNTACTION.ID <http://accountaction.id/> != min(ACCOUNTACTION.ID <http://accountaction.id/> )); The reason being, for duplicates records accountaction.id <http://accountaction.id/> will always equal to min(accountaction.id <http://accountaction.id/> ). try this select ACCOUNTACTION.ID <http://accountaction.id/> from ACCOUNTACTION where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' group by ACCOUNTACTION.ACCOUNT_ID having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 or select ACCOUNTACTION.ID <http://accountaction.id/> from ACCOUNTACTION where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' group by ACCOUNTACTION.ACCOUNT_ID having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and ACCOUNTACTION.ID <http://accountaction.id/> =min(accountaction.id <http://accountaction.id/> ); I would use the first select statement. On 8/6/08, Magnus Smith <[EMAIL PROTECTED]> wrote: I have the following two tables ACCOUNTACTION +-------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------+--------------+------+-----+---------+-------+ | ID | bigint(20) | NO | PRI | | | | AccountActionType | varchar(31) | YES | | NULL | | | DESCRIPTION | varchar(255) | YES | | NULL | | | ACTIONDATE | datetime | YES | | NULL | | | ACCOUNT_ID | bigint(20) | YES | MUL | NULL | | +-------------------+--------------+------+-----+---------+-------+ and ACCOUNTPAYMENTACTION +---------------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------------+------------+------+-----+---------+-------+ | ID | bigint(20) | NO | PRI | | | | AMOUNTINPENCE | bigint(20) | YES | | NULL | | +---------------+------------+------+-----+---------+-------+ ACCOUNTPAYMENTACTION shares the primary key with ACCOUNTACTION I need to remove duplicate entries that occured at a specific time in ACCOUNTACTION I then plan to remove the rows in ACCOUNTPAYMENTACTION that are no longer referenced in ACCOUNTACTION by using an outer join I can select the duplicate records in ACCOUNTACTION using select ACCOUNTACTION.ID <http://accountaction.id/> from ACCOUNTACTION where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' group by ACCOUNTACTION.ACCOUNT_ID having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and ACCOUNTACTION.ID <http://accountaction.id/> != min(ACCOUNTACTION.ID <http://accountaction.id/> )); I am trying to delete these records but am having trouble with the sql delete I tried the following but nothing happened delete ACCOUNTACTION where ACCOUNTACTION.ID <http://accountaction.id/> in (select ACCOUNTACTION.ID <http://accountaction.id/> from ACCOUNTACTION where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' group by ACCOUNTACTION.ACCOUNT_ID having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and ACCOUNTACTION.ID <http://accountaction.id/> != min(ACCOUNTACTION.ID <http://accountaction.id/> ))); Can anyone help me? -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]